WiCRnrnp' 


AD- A 167  700 


DTIC/TR-86/10 


AD-A167  700 


FIRST  CONFERENCE  ON 
COMPUTER  INTERFACES  AND  INTERMEDIARIES 
FOR  INFORMATION  RETRIEVAL: 
SELECTED  PAPERS 


Held 

October  3-6,  1984 
Williamsburg,  Virginia 


Cameron  Station,  Alexandria,  VA  22304-6145 


1*.  REPORT  SECURITY  CLASSIFICATION 

UNCLASSIFIED 


2a.  SECURITY  CLASSIFICATION  AUTHORITY 


2b.  OECLASSlFICATION  /  DOWNGRADING  SCHEOU 


4.  PERFORMING  ORGANIZATION  REPORT  NUMBER(S) 

DTIC/TR-86/10 


REPORT  DOCUMENTATION  PAGE 


lb  RESTRICTIVE  MARKINGS 


3  DISTRIBUTION/ A WULABIiTY  OF  REPORT 

Approved  for  public  release; 
distribution  unlimited 


5  MONITORING  ORGANIZATION  REPORT  NUMBERS) 


6a.  NAME  OF  PERFORMING  ORGANIZATION  1 6b  OFFICE  SYMBOL  7a.  NAME  OF  MONITORING  ORGANIZATION 

Defense  Technical  Information!  (lf  •ppUfbie) 

Center  I  DTIC 


6c.  ADDRESS  (Oty,  Star*,  and  ZIP  Cod*) 

Cameron  Station 
Alexandria,  VA  22304-6145 


8a.  NAME  OF  FUNDING  /SPONSORING 
ORGANIZATION 


8c.  AOORESS  (City,  Stst*.  tnd  ZIPCodt) 


7b.  ADDRESS  (City,  State.  »nd  ZIP  Co dej 


BA.  OFFICE  SYMBOL  I  9.  PROCUREMENT  INSTRUMENT  IDENTIFICATION  NUMBER 

(If  ippHcsbh)  | 


10  SOURCE  OF  FUNDING  NUMBERS 


PROGRAM  PROJECT 

ELEMENT  NO.  NO. 

65801S 


WORK  UNIT 
ACCESSION  NO 


1 1  TITLE  ( Include  Security  Cltsufkttion) 

Selected  Papers  from  the  first  Conference  on  Computer  Interfaces  and  Intermediaries  for 


12  PERSONAL  AUTHOR(S) 

Marjorie  E.  Powell,  Editor 


13a.  TYPE  OF  REPORT 


16  SUPPLEMENTARY  NOTATION 


13b.  TIME  COVERED 
FROM _ TO 


14.  DATE  OF  REPORT  (Year,  Month,  Day) 


17. 

COSATI  COOES 

FIELD  1 

1  GROUP 

1  SUB-GROUP 

IB.  SUB.iCT  TERMS  (Continue  on  reverie  if  necestsry  and  identify  by  block  number) 

Intermediary,  Information  Systems,  Data  Bases,  Expert 
Systems,  Artificial  Intelligence,  Conference  Proceedings 


'9  abstract  (Continue  on  reverie  if  necenary  and 


by  block  number) 


The  Defense  Technical  Information  Center  sponsored  a  conference  on  Computer  Interfaces 
and  Intermediaries  for  Information  Retrieval  to  promote  exchange  and  dissemination 
of  research  effoirts  leading  toward  improvement  of  information  retrieval.  ^Topics  included 
in  the  selected  papers  cover  the  general  area  of  information  retrieval.  Human-computer 
interaction,  common  command  language,  intermediary  systems,  and  future  directions  in 
artificial  intelligence  were  the  session  topics.  . 


20  DISTRIBUTION  /  AVAILASIUTY  OP  ABSTRACT 

ED  UNCLASSIFIEDAJNUMITED  □  SAME  AS  RPT.  Q  OTIC  USERS 


224.  NAME  OP  RESPONSIBLE  INOIVIOUAl 

Marjorie  E.  Powell 


S3  APR  «ditton  may  b#  mad  until  axhaustM 


21.  ABSTRACT  SECURITY  CLASSIFICATION 


22C  OFFICE  SYMBOL 


PREFACE 


*  On  3-6  October  1984  the  Defense  Technical  Information  Center  sponsored, 
with  the  support  of  the  American  Defense  Preparedness  Association,  a  conference 
entitled ^Computer  Interfaces  and  Intermediaries  for  Information  Retrieval"/1 
The  purpose  of  the  conference  was  to  bring  together  experts  in  the  field  of  user 
interfaces  to  promote  a  sharing  of  the  results  of  developmental  efforts.  . 

Six  papers,  one  abstract,  and  one  summary  are  included  in  this  group  of  ! 
selected  papers.  The  selection  of  papers  implies  a  matter  of  availability 
rather  than  judgment.  The  sessions  of  the  conference  were  recorded  and 
transcribed;  as  is  often  the  case,  the  transcriptions  were  of  uneven  quality, 
presenting  extreme  difficulty  in  working  some  of  the  presentations  into  suitable 
papers.  Not  included  in  this  volume,  but  of  equal  value  to  the  conference,  were 
presentations  by  Martha  Williams,  as  keynote;  Carol  Fenichel;  Charles  Hildreth; 
Michael  Monahan;  Alan  Negus;  Viktor  Hampel;  Rita  Bergman;  Tamas  Doszkocs;  David 
Toliver;  Lionel  Bernstein;  and  Gabriel  Jakobson. 

The  Second  Conference  on  Computer  Interfaces  and  Intermediaries  for 
Information  Retrieval  was  held  in  Boston  28-31  May  86.  The  majority  of  the 
speakers  who  presented  papers  at  the  first  conference  on  interfaces  and 
intermediaries  responded  to  the  invitation  to  report  on  progress  in  their  work. 
They  were  joined  by  other  distinguished  researchers  in  the  field.  The 
proceedings  of  the  second  conference  on  computer  interfaces  and  intermediaries 
will  be  available  by  the  end  of  summer  86. 


TABLE  OF  CONTENTS 


Session  I 

Automated  Information  Systems:  The  Human  Element 


Some  Design  Ideas  for  Subject  Access  in  Online  Systems 
by  Marcia  J.  Bates . . . 

Human-Computer  Interaction  Research  and  Information 
Retrieval  Systems  (abstract) 

by  Christine  Borgman . . . . 

The  User  Interface:  Some  Preliminary  Results  from 
the  Dartmouth  Online  Catalog 

by  Emily  Fayen . . . . . . . 

Research  in  Search  Models 

by  V.  David  Penniman . . . 


Session  II 
Command  Languages 

Integration  of  Common  Command  Languages  with  Intelligent  Gateways 
by  Hilary  D.  Burton . . . 


Session  III 
Intermediary  Systems 

Intermediary  Systems  for  Information  Retrieval 
by  Richard  S.  Marcus . 


Session  IV 

Artificial  Intelligence,  Future  Directions 

Menu-based  Natural  Language  Interfaces  to  Databases 

by  Craig  Thompson . . . 

Summary  of  Plenary  Session  on  Expert  Knowledge  Systems 
by  Linda  C.  Smith . 

Appendices: 

A.  Agenda  From  Computer  Interfaces  and  Intermediaries 

for  Information  Retrieval  Conference . . . 


B.  Author  Index 


SOME  DESIGN  IDEAS  FOR  SUBJECT  ACCESS  IN  ONLINE  SYSTEMS 


Marcia  J.  Bates 
Associate  Professor 

Graduate  School  of  Library  and  Information 
Science 

University  of  California,  Los  Angeles 


Marcia  J.  Bates  is  an  Associate  Professor  with  the  Graduate  School  of 
Library  and  Information  Science  at  the  University  of  California  at  Los 
Angeles.  She  has  an  MLS  and  PhD  from  the  University  of  California,  Berkeley. 
She  has  conducted  research  on  subject  access  in  catalogs,  and  has  written 
widely  on  search  strategy  in  online  and  manual  information  systems. 


Dr.  Marcia  J.  Bates 

Underlying  much  current  interface  design  are  a  number  of  unconscious 
assumptions.  I  want  to  challenge  some  of  those  assumptions.  I  would  like  us 
to  go  back  to  the  function  of  information  retrieval  itself  and  ask  how  we 
might  best  design  it.  We're  moving  forward  very  rapidly  with  the  existing 
traditional  information  system  design  structure;  I'm  not  at  all  convinced 
that  that  structure  is  the  best  use  of  available  technology.  So  I  am  going 
to  address  some  general  principles  of  information  systems  design  and  draw 
implications  for  interface  design. 

My  first  point  is  that  we  should  design  generous  systems.  I'll  say  more 
in  a  moment  about  what  1  mean  by  that  term  in  an  operational  sense,  but  for 
the  moment,  let  us  look  at  some  existing  systems. 

Current  information  systems  are,  as  a  rule,  not  generous.  The  prime 
example  of  an  ungenerous  system  is  the  traditional  card  catalog,  which, 
between  its  subject  headings  and  its  cross  references,  averages  something 
like  two  subject  access  points  per  document.  In  a  manual  catalog,  only  the 
first  word  of  each  subject  heading  constitutes  an  access  point,  because  there 
is  no  way  to  get  access  to  words  that  appear  later  in  the  heading. 

Online  catalogs  have  been  a  vast  improvement  on  manual  catalogs,  not  so 
much  because  system  designers  changed  the  indexing,  which  they  did  not,  but 
because  the  searcher  can  treat  individual  words  in  titles  and  subject 
headings  as  access  points  instead  of  just  the  first  word  in  the  heading  or 
title.  Even  this  simple  step  has  been  such  a  vast  improvement  on  the 
previously  stingy  access  available  that  online  catalogs  have  gotten  high 
rates  of  use  and  enthusiasm. 

But  we  still  have  a  long  way  to  go.  The  results  reported  in  the 
Matthews  book  on  the  national  online  catalog  use  survey  sponsored  by  CLR 
contain  the  following  figures;  An  aggregate  28  percent  of  the  users  found 
all  or  most  of  what  they  were  looking  for  and  18  percent  found  more  than  what 
they  were  looking  for.  Great,  but  that  leaves  40  percent  who  found  only  some 
of  what  they  were  looking  for  and  16  percent  who  found  none.  Now,  note  that 
most  of  these  libraries  were  very  large  academic  libraries  (though,  as  a 
matter  of  fact,  the  figures  do  not  vary  much  across  library  types  anyway). 

Yet  despite  the  great  size  of  these  libraries,  over  half  of  the  users  found 
less  than  most  or  all  of  what  they  were  looking  for.  I  think  it  highly 
likely  that  in  most  cases  the  material  was  there  and  the  subject  access  did 
not  connect  people  with  that  material. 


t 


I 


I  propose  that  the  direction  we  go  in  is  to  make  online  information 
systems  more  generous  so  that  we  can  reduce  that  55  percent  figure.  In 
general,  give  people  more  than  they  ask  for  and  let  them  override  it  if  they 
want  by  simply  ignoring  it.  Heaven  forbid  that  we  should  hassle  people  by 
giving  them  material  that  they  really  do  not  want  and  which  is  difficult  to 
ignore.  I  do  not  want  to  swamp  people  and  am  not  proposing  that.  The 
override  should  be  simply  a  matter  of  ignoring  any  part  or  all  of  a  wide 
array  of  data  that  is  on  the  screen.  They  should  be  able  to  pick  out  parts 
that  they  want  and  leave  the  parts  that  they  do  not  want.  It  should  not  be 
any  more  complicated  than  that.  But  I  do  think  that  we  should  put  out  more 
information  than  they  ask  for,  and  I  will  tell  you  why. 

For  one  thing,  our  minds  are  very  economical.  Once  we  find  the  term  to 
label  some  concept,  the  presence  of  that  term  in  our  minds  tends  to  block  out 
other  variant  labels  for  that  concept  by  a  psychological  interference 
process.  It  is  hard  to  think  up  several  terms  for  the  same  concept,  yet  most 
information  systems  require  that.  Relevant  information  may  be  found  under 
several  related  terms  and  at  several  levels  of  generality.  The  average 
person  approaching  the  system  to  search  on  "hypnosis"  would  probably  think, 
"What  other  term  could  there  be  for  such  a  discrete,  distinctive  phenomenon?" 

But  good  information  may  be  found  under  "hypnosis,"  "mesmerism,"  "altered 
states  of  consciousness,"  and  many  others  I  cannot  recall.  Research  shows 
that  people  look  only  one  place  in  a  catalog  the  majority  of  the  time;  yet  in 
my  dissertation  I  found  that  they  use  a  term  that  matches  with  the  relevant 
information  on  their  topic  of  interest  only  about  20  percent  of  the  time.  So 
the  rest  of  the  time  they  thought  they  were  finding  the  information  the 
library  had  but  were  in  fact  finding  only  peripheral  and  far  less  relevant 
information. 

This  is  old  hat  to  people  like  Tam  Doszkocs  and  Richard  Marcus  who  are 
designing  systems  that  expand  searches  by  providing  related  terms  for  the 
searcher,  as  they  are  doing,  or  provide  the  searcher  with  the  terms  he  or  she 
cannot  think  of  and  what  they  would  not  realize  need  to  be  thought  of. 

Subject  access  is  a  much  more  complicated  business  than  most  folks  realize. 

Now  let  us  move  to  my  second  reason  for  being  generous.  There  are 
basically  two  kinds  of  information  that  we  need:  information  we  know  we  do 
not  know  and  information  we  do  not  know  we  do  not  know.  With  the  first  kind, 
there  is  a  gap  in  the  map.  One  has  a  pre-existing  cognitive  structure,  say, 
a  knowledge  of  current  political  realities  in  the  Nation  of  Turkey,  and  one 
may  realize  that  to  proceed  further  in  thinking  on  that  topic  at  a  given 
time,  it  is  necessary  to  know  the  population  of  Turkey.  One  then  proceeds  to 
plug  that  hold  by  acquiring  this  information.  But  there  is  a  hole  there  in 
the  first  place  because  it  is  surrounded  by  knowledge  that  one  does  have. 


3 


sr«  «>  .  *  .v.VwVfcV.v. 


On  the  other  hand,  where  the  map  haa  not  been  charted  yet,  where  the 
cognitive  structure  has  not  been  built,  then  one  has  no  sense  of  what  one 
does  not  know.  There  is  no  framework  for  it.  If  another  researcher  takes  an 
approach  to  a  problem  that  has  not  occurred  to  me,  then  I  cannot  know  to  look 
for  that  approach  even  though  it  may  be  highly  relevant  to  my  own  work.  Thi3 
may  seen  obvious  when  said,  but  information  retrieval  systems  do  little 
deliberately  to  help  searchers  get  this  second  kind  of  information.  3y 
definition,  the  searcher  cannot  know  to  ask  for  the  second  kind  of 
information  directly.  We  need  to  study  how  we  might  provide  the  searcher 
with  information  of  possible  unpredicted  relevance. 

All  current  online  systems  are  vulnerable  in  this  regard.  When  you  do 
not  know  what  you  do  not  know,  the  best  search  strategy  is  to  randomly  expose 
yourself  to  information  of  possible  relevance.  This  is  what  browsing  is, 
and  it  is  generally  difficult  to  browse  in  online  systems. 

I’ve  given  a  couple  of  reasons  why  I  think  systems  should  be  more 
generous.  Let  me  now  be  more  specific  about  what  I  mean  by  that  term. 

One  of  the  carry-overs  from  traditional  information  systems, 
particularly  catalogs  and  abstracting  and  indexing  services,  is  the 
assumption  that  to  enrich  access  to  documents  or  other  forms  of  information 
means  adding  access  points  (descriptors  or  index  terms)  to  each  document  or 
document  reference.  That  adds  a  lot  of  storage  and  indexing  complexity  and 
hence  a  lot  of  cost  to  the  system,  so  we  naturally  resist  the  idea  of 
enriching  that  access  for  real  practical  reasons.  But  somehow  in  all  of 
this,  another  approach  has  been  virtually  ignored,  even  though  in  some 
respects  it  was  just  as  possible  in  the  old  manual  system  as  it  is  in  an 
online  systems.  This  is  the  approach  of  having  a  rich  complex  access 
structure  apart  from  the  indexing  of  individual  documents.  The  linkage  to 
the  document  need  be  made  only  fairly  late  in  the  process. 

Design  the  system  so  the  searcher  explores  among  the  terms  in  broad 
subject  areas  with  all  sorts  of  hints,  suggestions,  and  lines  of  thought 
presented  along  the  way.  Such  an  up-front  conceptual  structure  can  be  quite 
complicated  with  rich  inter-connecting  networks  of  relationships — without 
having  to  attach  all  of  that  complexity  to  any  individual  documents.  The 
searcher  enters  the  system  with  a  single  term  in  mind,  perhaps  one  that  might 
be  quite  inappropriate  with  respect  to  the  existing  document  indexing.  That 
is  all  right  because  the  system  responds  by  showing  the  searcher  what  terms 
are  used  in  the  system  to  index  that  topic.  Further,  it  asks  if  the  searcher 
might  also  be  interested  in  other,  related,  terms,  as  well.  The  system  might 
also  note  that  the  stated  term  can  be  considered  as  part  of  several  different 
topic  hierarchies.  Is  the  searcher  interested  in  composition  as  part  of 
English?  As  a  part  of  material  science?  As  a  part  of  music?  The  searcher 


4 


is  shown  a  rich  context  for  each  term  used  and  is  thus  made  indirectly  to 
realize  that  such  a  database  is  richer  and  more  complex  and  perhaps  more 
interesting  than  they  thought.  The  searcher  sees  that  perhaps  there  are 
better  terms  or  additional  terms  or  that  there  are  whole  areas  that  had  r.ot 
come  to  mind — in  other  words,  the  browsing  function.  All  this  occurs 
without  the  searcher  having  yet  looked  at  a  single  document  or  document 
record.  This  whole  structure  is  up  front  of  the  indexing. 

If  the  purpose  of  information  retrieval  is  to  put  the  user  together  with 
the  document,  then  this  up-front  system  is  a  kind  of  interface  to  facilitate 
that  process.  In  such  a  system,  the  searcher  moves  around  in  a  rich 
linguistic  and  conceptual  brew  before  asking  for  a  retrieval  on  particular 
documents.  He  or  she  may  look  at  some  trial  records  as  a  probe  to  see  what 
various  terms  cover,  but  the  assumption  is  that  the  searcher  can  move  around 
in  this  conceptual  front  end — a  sort  of  system  mind — before  settling  down  on 
selecting  documents. 

Note  that  I  am  not  merely  suggesting  an  online  thesaurus.  Thesauri  have 
always  been  designed  for  the  indexer  and  only  secondarily,  if  at  all,  for  the 
searcher.  We  information  types  tend  to  resist  putting  these  thesauri  onto 
online  systems  or  making  them  available  in  a  manual  situation,  because  they 
really  are  designed  for  the  indexer  and  not  the  searcher.  Thesauri  use 
obscure  symbols  and  abbreviations,  include  terms  not  actually  indexing 
documents  in  a  particular  system,  and  have  a  limited  number  of  cross 
references.  They  do  not  include  many  of  the  casual,  popular  terms  or  terms 
phrased  in  natural  language  grammar  rather  than  the  index-term  grammar  of 
thesauri.  In  short,  typical  thesauri  are  ill-suited  for  helping  the  end  user 
in  the  way  that  I  am  talking  about. 

Now  there  are  two  points  that  I  want  to  emphasize  about  this  front-end 
system  that  I  am  proposing.  First,  there  should  be  many,  many  entry  terms. 
Currently,  we  actually  make  it  hard  to  get  into  the  system  in  the  first  place 
because  the  searcher  must  have  terms  in  the  right  grammatical  form  to  fit  the 
indexing  system,  and  we  do  not  allow  people  to  start  with  the  more  colloquial 
phrasing.  Users  know  they  have  to  "tame"  their  natural  phrasing  before  they 
even  approach  the  system.  Why  not  make  it  easy  to  get  in — provide  a  vast 
entry  vocabulary — and  then  guide  the  searcher  through  all  the  terminological 
possibilities  to  the  terms  that  actually  index  documents  of  interest  to 
someone  who  started  with  the  colloquial  terms?  If  you  have  twenty  entry 
phrases  for  a  particular  concept,  you  do  not  have  to  index  all  the  documents 
on  that  concept  by  the  twenty  entry  terms.  Just  pick  two  or  three  and  let 
the  system  tell  the  searcher  which  one  to  use.  So  one  can  vastly  increase 
the  entry  richness  without  significantly  increasing  the  storage  indexing 
costs. 


Secondly,  we  have  not  begun  to  tap  the  possibilities  in  creating  this 
front-end  mind  in  an  automated  environment — there  are  many  interesting 
possibilities.  Terms  can  be  shown  not  only  in  alphabetical  proximity  to 
entry  terms,  as  they  are  now  with  "neighbor"  commands  and  the  like,  but  also 
in  a  variety  of  other  ways.  For  example,  a  term  can  be  shown  in  several 
heirarchies  of  related  terms;  many  concepts  are  polyhierarchical  in  that  they 
can  fit  logically  into  several  hierarchies.  All  of  these  for  a  term  could  be 
displayed  at  once  on  the  screen,  each  with  the  entry  term  in  the  center  of  a 
small  tree  of  broader  and  narrower  terms. 

Another  way  of  displaying  relatedness  is  showing  co-indexing.  For 
example,  some  sample  documents  can  be  accessed  by  the  system  and  all  other 
terms  besides  the  search  term  which  are  applied  to  the  sample  documents  can 
be  shown  also.  Such  displays  can  be  an  interesting  nudge  to  serendipity. 

Alphabetically  proximate  terms  can  be  shown  from  several,  not  just  one, 
thesauri.  Remember,  we  are  just  giving  people  ideas  at  this  point,  so  it  is 
all  right  to  have  several  thesauri  as  long  as  the  searcher  is  guided  at  some 
point  to  terms  actually  used  in  indexing  specific  documents.  There  are  many 
forms  of  connectedness  that  we  can  display,  some  of  them  quite  easy  to 
produce  and  others  more  difficult.  We  can  create  an  amazing  richness,  I 
think. 

We  have  not  done  much  of  this  yet  because  we  keep  linking  the  access  to 
the  indexing.  We  can  have  a  conventional,  fairly  thin— but  I  would  not 
recommend  too  thin — indexing  of  documents  linked  with  a  rich  access  structure 
which  can  be  very  complex,  and  which  helps  the  searcher  find  the  best  terms 
before  finally  selecting — all  fairly  cheaply. 

The  irony  is  that  a  rich  entry  vocabulary,  such  as  I  have  described, 
could  have  been  provided  in  manual  catalogs  a  hundred  years  ago.  The 
argument  has  always  been  made  that  current  library  subject  cataloging  is  so 
stingy  because  each  access  point  represents  another  card  in  the  catalog  and 
hence,  terrible  bulk  if  you  have  more  than  a  few  entries  per  document.  But  a 
rich  entry  vocabulary,  apart  from  the  indexing  of  documents,  could  have  been 
supplied  in  notebookss  hanging  from  chains  by  the  catalog  a  hundred  years 
ago — without  adding  a  single  card  to  the  catalog.  Now  I  want  to  tackle  our 
assumptions  about  information  system  design  at  an  even  more  fundamental 
level.  The  central  mode  or  paradigm  of  information  retrieval  research 
centers  around  the  idea  of  a  match  between  the  search  query  and  the  document 
indexing  in  the  system.  The  query  is  broken  down  into  its  component  terms, 
which  are  then,  in  turn,  matched  successfully  against  indexing  on  all 
documents  in  the  system.  Documents  which  match,  according  to 


whatever  algorithm  is  used,  are  then  disgorged  from  the  system  as  a  retrieved 
set.  This  set  is  then  evaluated  by  the  requester  or  searcher  and  documents 
are  categorized  as  to  whether  they  are  relevant  or  irrelevant.  Relevance 
figures  are  then  used  to  determine  recall  and  precision  rates,  and  systems 
that  have  the  best  recall  and  precision  are  considered  to  be  the  best 
systems. 

Information  retrieval  is  thus  seen  as  a  pinpoint  match  at  one  moment. 
There  may  be  a  lot  of  activity  preceding  that  one  moment,  a  lot  of  indexing 
at  one  end  and  a  lot  of  modification  and  design  of  the  search  formulation  at 
the  other  end,  but  the  end  result  of  all  of  this  is  that  at  some  moment  a 
match  takes  place  between  query  and  document  indexing  to  produce  a  retrieved 
set. 

Sometimes  this  model  is  elaborated  to  be  an  iterative  process  with  the 
searcher  acting  on  system  feedback,  but  even  here  the  assumption  is  that  the 
iterations  lead  to  the  final  best  matching  3et.  And  so  once  again,  we  have 
this  pinpoint  match  idea. 

I  would  like  to  propose  a  different  paradigm,  or  at  least  one  that  can 
exist  in  parallel  to  this  one.  Nicholas  Belkin  has  argued  that  current 
systems  make  excessive  demands  on  the  requester.  The  requester  must  have  a 
fully  articulated,  logical  coherent  request,  one  that  can  be  put  to  the 
system  for  matching  purposes.  But  as  Robert  Taylor  pointed  out  a  number  of 
years  ago,  the  query  goes  through  a  number  of  stages  of  development  in  the 
mind  of  the  requester  from  a  crude  visceral  "felt  need"  all  the  way  up  to  a 
fully  articulated  and  rational  query  adapted  to  the  vocabulary  of  the 
information  system.  Belkin  has  argued  that  we  require  the  user  to  do  all  the 
work  of  rationalizing  and  translating  that  query  before  coming  to  use  an 
information  system.  Why  should  we  not  enable  the  user  to  initiate  the  search 
for  information  earlier  in  this  cognitive  process? 

Belkin  says  that  the  requester  has  what  he  calls  an  "anomalous  state  of 
knowledge,"  or  "ASK."  He  argues  that  to  transform  the  ASK  into  a 
well-defined  query  is  difficult  and  constitutes  an  unreasonable  demand  on  the 
requester.  (The  basic  paradox  of  information  searching  is  that  you  are 
always  asking  for  or  about  something  you  do  not  know.)  He  argues  that  the 
searcher  should  be  able  to  come  to  the  system  in  the  "ASK"  stage,  and  get  the 
needed  information. 

Robert  Oddy  developed  a  system  called  "Thomas"  based  on  these 
principles.  The  user  need  only  produce  a  single  word  and  the  system  then 
provides  a  rich  array  of  possible  directions  to  go  in.  The  searcher  can 
explore  his  or  her  way  through  the  information,  picking  up  what  he  or  she 
wants  along  the  way.  There  is  no  final  single  step,  rather  a  learning 
process  in  which  the  original  interest  is  developed  and  modified  and 
documents  and  data  taken  away  during  this  process  as  the  searcher  wishes.  I 
do  not  think  Oddy  would  have  articulated  it  quite  this  way;  I  should  say  that 
I  am  adding  some  of  my  own  interpretations. 


Michael  Williams,  a  psychologist  at  Xerox  PARC,  has  been  doing  some  work 
along  the  same  lines  which  he  calls  "query  by  instantiation."  The  searcher 
does  not  have  to  name  the  need.  The  system  gives  suggestions  and  examples, 
in  turn  responded  to  by  the  searcher  sufficiently  to  produce  an  effective 
process. 

Let  us  call  the  conventional  approach  the  "matching  paradigm"  and  this 
one  the  "exploratory  paradigm."  In  the  latter  case,  the  searcher  and  the 
system  meet  earlier  in  the  process,  while  the  need  is  not  yet  fully 
crystallized.  With  the  exploratory  paradigm,  we,  as  information  scientists, 
do  not  have  to  worry  about  whether  we  have  provided  the  perfect  match  for  the 
perfectly  articulated  search  query  as  a  result  of  perfect  indexing  with  the 
ideal  vocabulary.  Rather,  we  worry  about  whether  we  have  provided  a  rich 
linguistic  and  conceptual  world  to  explore  in.  That  world  has  frequent 
linkages  to  actual  documents.  There  is  a  connection  between  the  system  mind 
and  the  documents,  there  is  indexing.  But  it  is  the  searcher's  choice 
whether  to  follow  up  on  any  of  these  linkages. 

In  a  funny  way,  we  have  tried  to  do  both  too  much  and  too  little  in  the 
past  for  users.  We  try  to  design  the  perfect  system  so  the  searcher's  first 
search  formulation  matches  with  he  ideal  retrieved  set.  This  is  a  tall  order 
and  it  is  debatable  whether  we  can  ever  do  it.  On  the  other  hand,  we  deprive 
the  searcher  of  doing  the  exploring  that  most  people  like  to  do  with 
information  and  need  to  do  at  one  time  or  another.  But  by  all  means,  let  us 
continue  to  design  systems  that  require  as  little  as  possible  for  the 
searcher  to  do.  The  principle  of  least  effort  is  an  overwhelming  factor  in 
people's  information  seeking.  But  let  us  also  design  systems  according  to 
the  exploratory  paradigm  so  people  can  fish  around  in  information,  can  play 
with  it  as  they  would  a  video  game— so  they  can  take  pleasure  in  a  treasure 
hunt  for  information  on  those  occasions  when  that  is  what  they  want  or  need 
to  do. 

Mr.  Bollinger 

We  have  time  for  two  or  three  questions  if  there  are  any. 


QUESTION 

Our  terminology  bothers  me.  When  you  say  systems,  do  you  include 
people,  instead  of  just  hardware  and  software,  as  be  a  potential  interface 
between  somebody  asking  the  question  and  the  retrieval?  In  other  words, 
someone  on  the  telephone  lines  helping  the  asker  with  interpreting? 


Dr.  Bates 


By  all  means.  I  am  using  "searcher”  here  to  mean  both  end  user  and 
intermediary.  I  think  often  even  the  librarian  who  has  received  the  request 
might  want  to  do  some  exploring  that  way  too,  because  the  end  user  who  made 
the  request  does  not  know  about  all  the  possible  directions.  In  other  cases, 
the  question  will  be  very  cut  and  dried  and  the  experienced  intermediary  will 
know  that  the  one  best  term  for  that  thing  is  X,  and  goes  in  with  X  and  gets 
the  answer.  That  is  the  kind  of  search  that  our  systems  do  best  now.  But  I 
think  our  system  capabilities  under-perform  for  all  the  other  query  types. 

Comment 


There  is  very  little  human  control. 
Dr.  Bates 


I  think  the  existing  systems  do  not  take  enough  account  of  certain 
characteristics  of  cognitive  processing  of  information.  There  is  a  lot  of 
research  that  we  need  to  do.  We  in  information  science  do  not  know  enough 
right  now  about  how  minds  operate,  but  we  get  these  little  hints  here  and 
there  like  this  interference  effect  I  mentioned,  namely,  difficulty  in 
thinking  of  other  terms  once  you  have  a  label  for  something  in  your  mind. 
There  are  probably  a  lot  more  psychological  patterns  like  that  that  operate 
when  people  confront  an  information  system.  We  need  to  research  those  things 
and  design  systems  from  the  user  back.  We  are  still  technology  driven.  We 
design  systems  from  the  technology  forward  instead  of  from  the  user  back. 

Comment 

I  think  it  is  very  stimulating  to  think  through  the  ideas  you  presented 
and  it  reminded  me  that  maybe  in  addition  to  just  presenting  a  rich 
navigational  brew  up  front,  if  you  will,  of  things  to  look  at  and  to  be 
reminded  of,  which  is  a  very,  very  useful  thing,  we  should  think  of  the  fact 
that  you  only  recognize  things  as  potentially  useful  if  you  already  at  least 
vaguely  know  them.  Whereas  in  real  life  you  can  easily  see  that  if  you  start 
suggesting,  a  lot  of  things  will  not  be  known  to  you  and  that  does  not  mean 
that  they  might  not  be  pertinent.  So  we  should  think  of — and  this  has  been 
proposed  by  people — linking  an  instructional,  definitional  component.  For 
example,  linking  computer  assisted  instruction  type  systems— that  would  be 
very  ambitious— but  at  modest  levels,  appropriate  levels,  to  this  exploratory 
navigational  tool  because  for  an  average  person  looking  for  some  medical 


I 


information,  they  do  not  know  that  mumbo- jumbo.  Yet,  they  may  be  very  well 
concerned  about  heart  attacks  and  bypasses  and  what  not,  and  they  would  like 
to  learn,  and  a  real  definitional  instructional  capability  would  come  in 
handy  quite  often. 

Dr.  Bates 

I  think  there  are  two  senses  of  that,  too.  From  the  heart  attack 
example,  you  could  give  the  medical  definitions  and  you  could  also  provide 
scope  notes  beyond  what  are  traditionally  provided  in  the  thesauri  or 
indexes.  You  could  provide  scope  notes  that  would  explain  to  the  searcher 
the  distinctions  that  the  searcher  may  have  trouble  making  that  the  indexer 
assumes  we  already  know.  So  you  are  helping  them  both  with  the  conceptual 
content  as  well  as  the  organizational  rules  used  in  that  system.  That’s  an 
interesting  idea. 


HUMAN-COMPUTER  INTERACTION  RESEARCH  AND 
INFORMATION  RETRIEVAL  SYSTEMS 
(abstract) 

Christine  L.  Borgman 
Assistant  Professor 

Graduate  School  of  Library  and  Information 
Science 

University  of  California,  Los  Angeles 


Human-Computer  Interaction  Research  and  Information 
Retrieval  Systems:  Issues  and  Implications 


A  primary  goal  of  human-computer  interaction  research  related  to  information 
retrieval  systems  is  to  make  systems  sufficiently  easy  to  use  so  that  the 
distinction  between  skilled  search  intermediaries  and  end  users  ceases  to  be  of 
importance.  One  way  to  accomplish  that  goal  (and  a  central  concern  of  this 
conference)  is  to  develop  search  assistance  programs  that  can  serve  as 
"automated  search  intermediaries,"  simplifying  the  interface  and  alleviating  the 
need  for  the  human  intermediary.  We  must  then  ask  whether  this  is  a  realistic 
goal. 

Automated  information  retrieval  systems  are  widely  available  and  the  number 
of  databases  and  systems  is  increasing  rapidly.  But  who  is  using  them?  The 
predicted  takeover  by  end  users  has  not  occurred  and  human  search  intermediaries 
are  not  yet  an  endangered  species.  We  find  a  similar  unwillingness  by  end  users 
to  access  online  catalogs  in  libraries,  due  to  the  time  investment  required  for 
learning. 

Once  people  do  use  the  systems,  we  must  ask  about  the  quality  of  their 
performance.  We  are  finding  that  both  commercial  retrieval  systems  and  online 
catalogs  are  difficult  technologies  for  many  people  to  conquer.  Search 
intermediaries  seem  to  have  few  problems  with  the  mechanics  of  interactive 
systems,  but  still  have  difficulty  with  some  of  the  conceptual  aspects.  In 
online  catalogs,  recent  studies  show  consistently  high  error  rates  across 
systems  and  a  tendency  to  abandon  the  system  before  achieving  meaningful 
results. 

Can  automated  intermediaries  solve  these  problems?  The  currently-available 
assistance  programs  are  able  to  assist  in  the  mechanical  aspects  of  searching, 
but  provide  little  assistance  for  the  conceptual  aspects,  such  as  structuring 
poorly-phrased  questions  in  terms  of  system  capabilities  and  selecting 
appropriate  databases  and  vocabulary  terms. 

Basic  and  applied  research  on  several  of  the  psychological  aspects  of 
human-computer  interaction  shows  promise  for  alleviating  some  of  these  problems. 
Mental  models  research  suggests  that  people  develop  a  model  of  a  system  for  use 
in  interacting  with  it.  Systems  and  training  designed  around  an  appealing  and 
intuitive  conceptual  model  can  ease  both  the  learning  and  use  of  interactive 
systems.  Research  into  information  processing  models  promises  to  optimize 
screen  displays  and  input  devices  to  human  processing  limits,  increasing 
efficiency  and  decreasing  error  rates.  Individual  differences  research  may 
eliminate  problems  that  certain  sectors  of  the  population  have  with  specific 
systems,  increasing  access  to  the  technology  for  all. 


THE  USER  INTERFACE:  SOME  PRELIMINARY  RESULTS 
FROM  THE  DARTMOUTH  ONLINE  CATALOG 


Emily  Fayen 

Assistant  Director  for  Library  Systems 
Van  Pelt  Library 
University  of  Pennsylvania 
(at  time  of  the  conference: 

Director,  Library  Automation 
Baker  Library,  Dartmouth) 


The  User  Interface:  Some  Preliminary  Results  from  the  Dartmouth  Online 

Catalog 


The  Dartmouth  Online  Catalog  Project  began  in  early  1979*  It  has  evolved 
gradually  over  the  past  five  years.  The  system  is  now  running  on  a  DEC  VAX 
11/750  computer.  It  has  4  Mb  of  main  memory  and  about  2  Gb  of  online 
storage.  The  online  catalog  contains  a  little  over  360,000  records.  The 
Dartmouth  Online  Catalog  uses  BRS/Search:  The  Mini/Micro  Version  as  the 
underlying  software.  The  system  is  running  under  the  Berkeley  4.2  version  of 
UNIX  and  the  latest  version  of  the  interface  is  written  in  C. 

Dartmouth  College  is  a  Telenet  node  and  has  installed  a  local  area  network 
to  link  the  campus  together.  Thus,  students,  faculty,  and  other  online 
searchers  can  get  access  to  the  Dartmouth  Online  Catalog  and  various  other 
online  databases  through  the  local  area  network  and  the  Telenet  connection. 
There  are  a  number  of  students  in  the  Amos  Tuck  Business  School  who  have 
access  to  BRS/After  Dark.  These  students  have  been  using  BRS/After  Dark  on 
their  own  and  have  been  using  it  very  heavily. 

Given  the  environment  at  Dartmouth,  it  would  be  nice  to  have  the  time  and 
staff  needed  to  do  some  formal  research  on  human  factors.  Most  of  our 
results,  however,  are  based  on  empirical  findings.  Charles  Hildreth,  at 
OCLC,  Inc.,  is  looking  at  some  transaction  data  from  the  Dartmouth  Online 
Catalog,  and  he  will  be  able  to  provide  some  more  formal  information  for 
future  analysis. 

One  of  the  most  significant  findings  is  that  user's  desired  style  of  online 
interaction  is  very  different.  Of  course,  one  of  the  difficulties  in 
studying  this  aspect  of  online  interaction  is  that  new  users  very  quickly 
lose  their  naivete,  so  a  new  pool  of  new  users  is  needed  very  often. 

However,  the  type  of  interface  that  users  want  (whether  new  to  the  system  or 
not)  seems  to  span  a  range  form  those  who  would  prefer  a  blank  screen  with 
perhaps  a  blinking  question  mark  in  the  middle  of  it  to  those  who  want  a 
detailed,  cook-book-like  approach  with  step-by-step  instructions  as  to  how  to 
conduct  the  inquiry. 

Another  critical  factor  is  the  log-on  procedure.  It  must  be  kept  very 
simple.  It  must  be  very  easy  for  the  user  to  log  on  and  to  get  started.  Dr. 
Borgman  uses  the  video  game  analogy,  where  it  is  very  easy  for  the  new  user 
to  insert  a  quarter  and  start  to  play  the  game. 


It  must  be  very  easy  to  perform  basic  operations.  It  must  also  be  very  easy 
for  users  to  learn  to  perform  more  complex  operations.  Here  again,  the  video 
game  parlour  is  a  good  example.  No  first-time  users  are  experts,  but  they 
all  very  quickly  learn  how  to  play  the  game  and  how  to  advance  their  skills. 

Another  important  finding  is  that  users  need  to  feel  that  they  are  in  control 
of  the  online  system  that  they  are  using.  They  don't  want  to  have  any 
surprises.  Ideally,  there  should  be  no  error  messages.  The  Apple  Macintosh 
has  a  very  positive  approach  to  errors.  It  is  virtually  impossible  to  make  a 
"mistake"  using  it.  The  system  may  not  do  what  you  intended  the  first  time, 
but  it  never  tells  you  that  you  did  a  bad  thing.  Dr.  Borgman  makes  another 
very  important  point  when  she  states  that  users  need  a  mental  model  of  how 
the  system  works.  Users  may  not  know  how  the  computer  system  or  search 
software  actually  accomplishes  what  it  does,  but  they  need  to  have  an  idea  in 
their  heads  about  how  they  can  use  the  system  to  accomplish  their  research 
needs.  For  example,  they  need  to  know  how  to  get  the  online  catalog  to 
display  again  the  search  results  from  query  number  5*  Users  don't  need  to 
understand  what  the  computer  system  has  to  do  to  make  this  happen,  but  they 
need  to  have  a  functional  understanding  of  how  the  system  works.  That  is, 
users  need  to  understand  the  relationships  between  the  commands  or  menu 
choices  that  they  make  and  the  system  responses. 

Furthermore,  users  need  a  mental  map  of  the  system  so  they  understand  the 
relationships  among  various  functions  and  how  they  can  move  from  one  to 
another.  In  addition,  users  need  to  know  how  to  go  back  and  review  what  they 
have  just  done,  or  perhaps  to  change  their  minds  and  execute  a  particular 
function  again. 

As  mentioned  earlier,  users  have  different  desires  with  respect  to  the  amount 
of  dialogue  that  they  want  from  the  system.  Users  may  also  have  different 
needs  at  different  times;  for  example  new  users  need  more  elaborate 
instructions  than  experienced  ones.  Infrequent  users  have  other  needs— they 
basically  remember  from  one  session  to  the  next  how  the  system  works,  but 
they  need  prompting  for  the  appropriate  command  syntax  and  so  forth.  These 
users  need  an  easy  way  to  get  help  from  the  system  any  time  they  forget  how 
to  do  something  or  don't  remember  exactly  the  command  structure. 

But  these  users  do  not  want  to  be  burdened  with  time-consuming  menus  and 
lengthy  explanations. 

The  very  experienced  user  needs  to  be  able  to  cut  through  all  the  menus, 
explanations,  online  tutorials,  and  other  user  aids  and  interact  with  the 
underlying  system  at  its  most  efficient  level.  These  users  know  what  all  the 
features  do  and  need  ways  to  enter  commands  quickly  and  easily  and  with  a 
minimum  of  keystrokes.  However,  even  these  highly  skilled  and  trained  users 
may  occasionally  need  to  use  a  new  option  or  feature  and  then  need  to  get 
help  from  the  system,  so  it  must  be  very  easy  for  the  user  to  move  back  and 
forth  from  one  dialogue  mode  to  another. 


Users  also  need  to  be  able  to  stop  any  procedure  at  any  time  without  crashing 
the  system  or  putting  it  into  some  kind  of  limbo.  All  of  us  remember 
instances  when  we  have  entered  a  search  term  in  error  or  made  some  other 
mistake  and  just  want  to  stop  whatever  is  going  on  and  have  the  system  put  us 
right  back  where  we  were  when  the  error  occurred. 

Users  also  need  some  kind  of  system-to-system  similarity.  As  users  move 
around  the  country  and  as  they  are  able  to  sign  on  to  more  and  more  systems 
from  their  local  terminals  using  various  telecommunications  packages,  the 
need  for  inter-system  consistency  becomes  ever  more  important.  A  common 
command  vocabulary  and  syntax  is  extremely  important. 

Another  type  of  interface  that  users  encounter  is  the  interface  to  various 
computer  systems.  Local  area  networks  are  becoming  extremely  important  in 
enabling  users  to  call  up  various  remote  systems,  but  they  do  not  entirely 
overcome  the  problems  of  incompatible  modems,  terminals,  and  cables.  The 
microcomputer  running  terminal  emulation  software  is  a  big  step  forward  in 
overcoming  some  of  these  interfacing  problems. 

As  more  and  more  software  packages,  database  managers,  and  compilers  become 
available,  users  need  various  interfaces  that  will  lend  some  consistency 
across  these  various  offerings.  That  is,  the  fact  that  there  are  a  number  of 
systems  and  software  packages  running  together  or  separately  must  be 
transparent  to  the  user.  From  a  functional  standpoint,  it  should  appear  to 
the  user  that  there  is  one  system  involved — the  fact  that  many  separate 
inter-related  systems  may  be  actually  supporting  the  user’s  activities  should 
be  totally  hidden  from  the  user. 

Finally,  the  user-to-user  connection  is  extremely  important.  We  must  not 
lose  sight  of  the  fact  that  we  are  in  the  business  of  delivering  information, 
and  sometimes  a  human  source  may  provide  the  quickest  and  best  response.  We 
must  not  become  so  bound  up  in  the  "correct"  way  to  do  online  searching  or 
retrieve  information  that  we  forget  that  the  important  thing  is  to  get  the 
information  to  the  user  in  as  timely  and  inexpensive  a  fashion  as  possible. 
The  wide  variety  of  online  end-user  accessible  information  services  supports 
the  contention  that  users  want  to  be  able  to  get  the  information  themselves 
and  are  willing  to  take  the  time  and  trouble  to  learn  how  to  use  these 
systems,  knowing  that  in  the  future  this  knowledge  will  shorten  the  time  it 
takes  to  get  information. 

Another  widely  used  source  of  information  are  the  various  electronic  mail 
systems  and  bulletin  boards  springing  up  around  the  world.  In  many  of  these, 
a  user  can  send  out  a  query  to  the  community  at  large  and  hope  for  a  response 
from  some  fellow  bulletin-board  user.  These  can  be  extremely  useful  sources 
of  information  because  they  often  bridge  the  gap  between  published  sources 
and  the  first-hand  expert  knowledge  that  one  used  to  be  able  to  get  only  from 
a  face-to-face  chat  or  via  telephone.  These  links  now  make  it  possible  for 
people  all  over  the  world  to  share  each  other's  knowledge  and  experience  in 
direct  fashion. 


►  v  *  .  - 


16 


RESEARCH  IN  SEARCH  MODELS 


by 


W.  David  Penniman 
ATAT  Bell  Laboratories 
600  Mountain  Avenue 
Murray  Hill,  New  Jersey  07974 
Telephone  (201)  582-2854 


Sitting  there  listening  to  the  other  speakers,  I  was  reminded  of  a 
Peanuts  cartoon  I  saw  some  time  ago  in  which  Schroeder  and  Linus  and 
Charlie  Brown  were  lying  on  their  backs  looking  up  into  the  sky  and  they 
were  each  saying  what  they  saw  in  the  cloud  formations.  Schroeder  said 
he  saw  this  montage  of  all  the  great  composers  moving  across  the  sky. 
Linus  said  it  looked  a  little  like  the  ceiling  of  the  Sistine  Chapel.  It 
finally  came  to  Charlie  Brown  and  in  his  most  typical  chagrined 
expression,  he  said  he  was  going  to  tell  about  the  doggy  and  kitty  he 
saw,  but  now  he  wasn't  sure  he  should  mention  that. 

What  I  want  to  say  to  you  may  sound  very  simple  compared  to  some  of 
the  things  we've  heard  this  morning,  but  I  think  it’s  a  necessary 
foundation  and  so  I'm  going  to  get  on  my  soapbox  and  preach. 

As  Bill  mentioned,  I'm  with  ATAT.  I  joined  them  one  week  after 
divestiture  and  I've  spent  the  last  eight  months  focusing  my  entire 
attention  on  changing  both  the  organizational  structure  and  the  cultural 
climate  of  the  group  I'm  in  charge  of.  That  is  a  disclaimer,  because 
everything  I  say  from  this  point  on  has  nothing  whatsoever  to  do  with 
ATAT  Bell  Laboratories,  but  rather  it  is  based  on  work  that  I  did  before 
I  joined  the  Labs.  (But  I  hope  it  will  affect  what  happens  at  the  Labs 
in  the  future). 

I  want  to  acknowledge  the  National  Library  of  Medicine  Extramural 
Grants  Program  for  funding  the  most  recent  work  that  much  of  what  I'm 
going  to  be  talking  about  is  based  upon.  I  said  it's  the  Extramural 
Program  or,  as  my  secretary  typed  in  our  Grant  acceptance  letter,  the 
Extramarital  Program  which  I  think  may  be  memorable  to  them!  The  work 
that  I'm  going  to  describe  to  you  has  been  published  in  two  different 
places.  First,  the  proceedings  of  the  1982  American  Society  for 
Information  Science  annual  meeting,  and  second  in  April  of  this  year  in 
the  ACM  Bulletin  of  the  Special  Interest  Group  on  Computer  and  Human 
Interactions.  So  it  is  available  if  any  of  you  would  like  to  see  it. 
Check  your  local  library. 

I  want  to  mention  also  why  I  got  into  this  kind  of  research  in  the 
first  place,  that  is,  research  on  how  people  actually  use  information 
systems  as  opposed  to  how  the  designers  think  they  use  them.  I  was  with 

Battelle  for  a  number  of  years  and  was  involved  in  the  early  design  of 

the  BASIS  System,  which  at  that  time  stood  for  Battelle 's  Automated 
Search  Information  System  and  since  probably  stands  for  something  else. 
But  in  any  case,  we  began  to  suspect  that  the  people  were  not  truly  using 
the  system  in  the  way  we  had  intended  them  to.  Either  they  weren't  using 

its  full  capabilities  or  maybe  they  were  using  it  in  ways  we  hadn't 


T^T* 


f»tav T.vTyT>'  "  ’.'v 


anticipated.  We  decided  as  designers  that  we  had  better  find  out  how  it 
was  being  used.  So  we  developed  and  applied  an  online  monitor.  At  first 
what  we  did  was  just  to  take  the  transactions  off  the  day  file  and 
extract  out  the  ones  that  were  appropriate  to  our  application,  and  then 
start  looking  at  them.  From  that  we  went  to  the  design  of  an  on-line 
monitor  built  with  the  BASIS  system  that  captured  all  the  transactions. 
This  gave  us  a  chance  to  look  at  individual  sessions  and  to  truly 
identify  what  was  being  done.  To  paraphrase  Lord  Kelvin,  "If  you  can't 
measure  it,  your  knowledge  about  it  is  meager  and  furthermore  you 
shouldn't  talk  about  it,  if  you  can't  measure  it.”  We  decided  we  should 
measure  what  was  going  on  in  the  system. 

What  I  want  to  present  to  you  right  now  are  some  results  of  recent 
monitoring  work  on  the  NLti  system.  First  of  all,  a  little  credit  for  the 
Extramural  Grants  Program.  They  provided  us  with  transaction  data  from 
the  NLM  System.  We  were  specifically  interested  in  the  Medline  database 
in  this  case.  I  might  mention,  by  the  way,  that  this  particular  study 
was  one  of  many.  I  already  indicated  that  we  monitored  the  BASIS  system. 
I've  been  involved  in  studies  monitoring  a  variety  of  systems,  including 
the  OCLC  on-line  cataloging  system.  The  techniques  are  basically  the 
same,  and  that's  what  I  want  to  focus  on  today  using  this  as  an  example. 

I  want  to  talk  about  the  techniques  and  how  you  as  individuals  who  can 
influence  system  design  should  be  aware  of  this  technique  and  should  be 
incorporating  it. 

The  methodology  was  to  obtain  transaction  tapes  from  the  National 
Library  of  Medicine  and  to  sort  those  transaction  tapes.  They  appear  in 
chronological  order  and  we  wanted  to  resort  them  so  we  could  identify 
discrete  sessions  by  individual  users  and  then  look  at  what  they  did  in 
those  sessions.  We  selected  only  sessions  involving  the  Medline 
database.  (I  might  mention  that  John  Tolle  at  OCLC  has  since  analyzed 
the  Catline  interactions  in  a  similar  way,  so  we're  able  to  compare 
between  two  different  databases.  I'll  get  to  that  comparison  later  on  in 
terms  of  why  it's  valuable.)  We  then  edited  out  some  of  the  spurious 
characters  in  the  transactions,  tagged  every  transaction,  assigned 
activity  codes  to  every  transaction  (and  I'll  show  you  what  activity 
codes  were  assigned).  Then  we  analyzed  the  entire  sample  and  drew  from 
it  subsamples  to  make  comparisons.  In  this  case,  they  were  subsamples 
based  on  the  frequency  of  use  of  the  system  and  we  were  able  to  identify 
users  who  used  it  a  high  degree  of  the  time,  moderately,  and 
infrequently.  We  began  to  compare  those  subsamples,  looking  for 
differences  between  different  types  of  users.  Finally,  we  compared  our 
results  to  what  other  people  had  done  previously. 

How  many  sessions?  Well,  almost  40,000  sessions  were  in  the  initial 
sample,  representing  over  2  million  transactions.  Now,  I  would  say  that 
applying  an  on-line  monitor  to  an  Interactive  system  provides  you  with 
more  data  than  you'd  care  to  analyze,  so  the  real  challenge  is  in 
developing  techniques  to  analyze  that  data  in  a  meaningful  fashion.  I 
think  we  have  a  handle  on  that.  Obviously,  it  represented  a  large  number 
of  hours  of  interactive  time,  as  well.  (See  Figure  1.) 


Note  that  the  average  number  of  minutes  per  session  was  11.  In 
comparing  the  subsamples,  I  want  to  comment  on  that  a  little  later  on. 

The  subsamples  of  frequent,  moderate,  and  infrequent  are  shown  in  Figure 

2.  We  tried  to  look  for  approximately  equal  numbers  of  sessions  in  each 
subsample.  That  meant  that  since  the  heavy  users  had  more  sessions,  we 
had  fewer  heavy  users  in  the  sample  than  we  had  infrequent  users.  We 
wanted  to  try  to  have  a  balance  in  terms  of  sessions  and  a  relative 
balance  in  terms  of  transaction  pairs,  although  as  you  know,  the 
infrequent  users  had  fewer  transaction  pairs. 

Note  there's  a  relative  consistency  across  the  minutes  per  session 
between  frequent,  moderate  and  infrequent.  The  frequent  users  were  on 
about  the  same  length  of  time  per  session  as  the  infrequent  users.  But 
there  are  some  underlying  structures  in  what  the  frequent  versus 
infrequent  users  do  that  causes  this  to  be  about  the  same. 

I  said  we  tagged  every  transaction  within  the  system.  (See  Figure 

3. )  It's  important  that  you  come  up  with  a  mutually  exlusive  and 
exhaustive  set  of  categories  to  which  you  can  assign  every  transaction. 

If  you  can  boil  it  down  to  a  small  enough  number  of  these  categories,  you 
can  begin  to  analyze  in  gross  terms  how  people  are  using  the  system. 

Then  the  data  doesn't  weigh  you  down — the  data  is  actually  useful. 

That's  the  set  of  categories  that  we  applied  against  the  NLM  system.  I 
might  mention  that  both  Chris  Borgman  and  Carol  Fenichel  have  been 
applying  this  same  kind  of  technique.  Charles  Hildreth  has,  as  well,  and 
a  number  of  other  people  are  now,  too.  So  we  begin  to  be  able  to  compare 
findings  across  systems. 

Figure  4  shows  profiles  of  the  three  different  user 
groups — frequent,  moderate,  and  infrequent — on  some  very  basic  activities 
that  are  performed  in  the  system.  What  we  looked  at  was  the  use  of  the 
term  search,  advance  term  search,  Boolean  and  display  across  those  three 
subgroups.  We  normalized  the  data  and  tried  to  compare  it  between  the 
frequent,  infrequent,  and  moderate  groups.  So  what  this  graph  says  is 
that  for  every  one  single  term  search  entered  by  an  infrequent  or 
frequent  searcher,  the  moderate  enters  about  1-1/2  terms.  For  every 
advanced  term  search  entered  by  the  infrequent,  the  frequent  enters  about 
2.  So  you  can  see  there's  a  clear  distinction  between  the  profiles  of 
the  frequent,  infrequent,  and  moderate  users.  From  this  you  can  begin  to 
draw  some  implications  as  to  not  only  how  they  use  the  system  but  how 
they  progress  as  they  become  more  experienced  in  the  use  of  the  system. 
There  were  a  large  number  of  findings  based  on  these  kinds  of 
comparisons. 

I'm  not  going  to  go  into  a  lot  of  detail  about  all  of  those 
findings.  You’ve  already  heard  about  some  of  them  by  other  speakers,  but 
let  me  just  mention  a  few  on  this  list.  (See  Figure  5.)  First  of  all, 
they  were  log-on  problems.  That  seems  to  be  widely  acknowledged. 

However,  when  we  submitted  the  results  to  the  National  Library  of 
Medicine,  they  did  not  acknowledge  that  there  were  log-on  problems.  I 
think  that  was  partly  because  it  way  an  implied  criticism  of  the  training 
program.  Even  though  we  showed  them  the  data,  they  said,  "well,  we  think 
there  may  be  errors  in  your  analysis  program  or  at  most,  problems  with 
the  communications  line  that  cause  spurious  data."  Mostly,  they  were 
willing  to  blame  the  communications  channels.  It  was  only  after  Michael 
Cooper  got  comparable  results  with  a  similar  subsample  from  NLM  that  I 
believe  they  accepted  the  idea  that  in  fact  there  are  log-on  problems. 


Almost  seventy  percent  of  the  IDs  that  we  sorted  out  of  the  samples 

were  spurious  IDs.  So  that  means  that  people  are  having  a  tough  time 

getting  into  the  system.  Why?  Well  we  went  back  into  our  data  and  took 
a  closer  look  at  a  selected  subsample  and  we  found  one  very  simple  reason 
that  accounted  for  many  of  the  errors. 

AUDIENCE  COMMENT 

Just  let  me  interject  what  makes  it  worse  is  that  the  system  drops 
you  off  if  you  made  a  simple  error.  That  is  just  most  frustrating. 

DR.  PENNIMAN 

Yes.  One  of  the  problems  that's  very  common  and  generated  a  lot  of 
these  sessions  that  ended  with  one  or  two  transactions  was  that  people 
were  trying  to  use  a  Tymnet  format  for  a  log-on  procedure  while  on 

Telenet,  or  a  Telenet  while  on  Tymnet.  Why?  They've  got  two  different 

systems  to  keep  straight.  They  forget.  They  enter  one  and  it's  an 
inappropriate  format  for  the  other.  You  can  capture  that  data.  You  can 
tell  what's  going  on.  You  can  help  the  user.  But  I  feel  like  we  have 
something  here  that's  really  powerful  and  we're  not  taking  advantage  of 
it. 

Another  problem  I'd  like  to  mention — use  of  the  display  command. 
There's  been  a  running  debate  as  to  whether  or  not  people  that  are  newly 
trained  on  the  system  are  really  doing  interactive  searching  or 
fast-batch.  Are  they  using  the  display  and  interactive  capability  of  the 
system  to  look  at  a  few  documents,  then  go  back  in  and  reformulate  their 
search  strategy,  and  then  go  back  and  search  again  and  look  at  some  more 
documents.  There  was  a  lot  of  conjecture  and  there  were  some 
suppositions  about  whether  or  not  people  were  doing  that.  It  was  very 
simple  to  answer  the  question.  We  just  look  at  some  of  the  sessions, 
looked  at  whether  or  not  displays  were  followed  by  additional  Boolean 
searches,  and  in  fact  they  were.  So  we  feel  we  were  able  to  put  at  rest 
something  that  was  a  running  debate  with  a  lot  of  very  opinionated  people 
saying  what  they  thought  was  going  on.  We  were  able  to  show  that  both 
infrequent  and  frequent  users  were  using  the  display  command  embedded 
between  Boolean  search  commands. 

Next  I'd  like  to  mention  the  80/20  rule.  For  those  of  you  who  are 
familiar  with  content  analysis  and  type/ token  vocabulary,  I  could  explain 
it  quickly.  Let  me  just  put  it  this  way.  A  very  small  number  of  the 
total  possible  types  of  transactions  account  for  a  very  large  number  of 
the  transactions  actually  occurring  in  the  sample.  In  other  words,  20 
percent  of  the  types  of  events  observed  account  for  80  percent  of  the 
events  in  the  sample.  That's  true  for  strings  as  short  as  2  or  3 
commands  in  the  sequence,  which  means  that  there's  not  a  great  deal  of 
diversity  of  use  in  terms  of  the  interaction  or  the  commands  or 
capability  in  the  system. 


The  use  of  certain  commands  can  distinguish  between  the  groups, 
which  I  showed  you  on  the  graph  before.  Also,  command  pairs  can  be  used 
to  distinguish  between  different  groups.  Now,  that's  very  powerful,  not 
only  because  you  can  go  back  after  the  fact  and  look  at  it,  but  also 
because  you  can  do  it  on  the  fly.  Since  you  can  do  it  on  the  fly,  you 
can  provide  the  users  with  adaptive  prompting  that's  tailored  to  what 
they’re  doing  in  the  system  and  give  them  instruction.  That's  what  IIDA 
was  all  about  that  Charlie  Meadow  was  working  on — individualized 
instruction  in  data  access  including  prompting. 

Novices  search  more  slowly,  but  you  notice  that  the  time  of  the 
session  is  about  the  same,  which  means  that  they  use  fewer  commands  and 
spend  more  time  on  each  command.  Charlie  Meadow  has  suggested — I  don't 
know  that  it's  every  been  verified,  but  I  think  it's  an  interesting  area 
for  study — that  the  limit  on  session  length  has  nothing  to  do  with  how 
many  commands  you  enter,  but  just  your  tolerance  for  sitting  at  a 
terminal  doing  a  single  task  for  a  certain  number  of  minutes. 

The  way  I  define  errors  is  different  than  the  way  it's  been  defined  by 
Carol  Fenichel  in  some  of  her  studies.  I  simply  looked  at  what  people 
typed  in  immediately  before  entering  a  correction.  This  indicated  at  a 
minimum  where  they  were  having  typing  problems.  The  most  frequent  errors 
occurred  in  the  Boolean  entries. 

Frequent  users  did  not  use  advanced  select  commands.  Even  the 
frequent  users  were  not  making  full  use  of  what  was  available  within  the 
system.  I  might  mention  that  in  the  published  results  of  this  study,  I 
compared  these  findings  with  findings  from  Carol  Fenichel,  Janet  Chapman 
and  Judy  Wanger  as  well  as  other  people  who  have  studied  systems  in  a 
similar  manner.  In  some  cases  we  were  able  to  verify  their  findings;  in 
other  cases  we  think  we  refuted  the  findings,  particularly  when  ours  were 
based  on  hard  data  and  theirs  were  based  on  a  small  sample  or  interviews. 
We  think  that  this  data  is  pretty  solid  based  on  what  people  actually  did 
in  the  system. 

As  far  as  the  conclusions  of  that  study  are  concerned,  we  think  the 
methodology  allows  for  comparison  across  systems  and  databases — and  I 
might  add  also,  researchers.  The  only  way  you  can  gain  credibility  in 
research  is  to  have  results  that  can  be  replicated.  This  technique 
allows  for  replication.  (See  Figure  6.) 

I  am  talking  about  our  studies  from  a  research  standpoint.  When  I 
go  back  to  the  podium  I'm  going  to  talk  about  it  from  an  applications 
standpoint  in  terms  of  systems  that  are  used  every  day.  The  methodology 
allows  for  testing  against  previous  results.  I  have  to  admit  the 
methodology  needs  further  refinement,  particularly  in  terms  of  some  of 
the  models  that  we've  been  building.  For  those  of  you  who  are 
interested,  the  models  are  stochastic  process  models  involving  transition 
matrices  that  show  the  probability  that  a  user  will  go  from  one  state  to 
another.  We  were  able  to  build  those  on  the  basis  of  empirical  data. 


22 


The  extension  of  the  methodology  to  online  public  access  catalogs 
(OPAC's)  already  has  been  done.  Charles  Hildreth  will  tell  you  more 
about  that  and  Chris  Borgman  has  already  mentioned  similar  studies.  I'm 
glad  to  see  that  it's  happening  with  OPAC’s  early  on,  so  that  we  aren’t 
going  to  continue  to  design  systems  under  false  assumptions  about  how 
they  are  being  used. 

Now  I’d  like  to  go  back  to  the  podium  and  make  some  comments  based 
not  only  on  that  research  but  also  on  my  general  feeling  about  this  whole 
issue  of  improving  the  use  of  information  systems. 

First  of  all,  I  hear  a  continual  reference  to  the  user/system 
interface.  If  it’s  the  user/computer  system  interface,  then  I  think  that 
may  be  an  appropriate  terminology.  If  we’re  talking  about  the 
user/information  system  interface,  then  I  really  object.  The  system 
boundary  for  an  information  system  is  drawn  somewhere  behind  the  user, 
not  between  the  user  and  that  system  and  unless  we  take  that  into  account 
in  our  design  of  systems  (that  the  user  is  part  of  the  system,  not 
outside  the  boundary  of  it),  we’re  going  to  continue  to  design  systems 
that  really  don’t  fulfill  their  intention.  And  that’s  not  just 
semantics.  The  more  we  talk  about  it  that  way,  the  more  we  think  about 
it  that  way.  So  just  as  you  hear  more  and  more  people  saying  "his  or 
hers"  or  "chairperson,"  I  think  we  ought  to  be  more  careful  about  the  way 
we  discuss  system  boundaries,  as  well. 

I  remember  the  intent  of  the  early  online  systems.  I  indicated  I 
was  involved  in  the  development  of  the  BASIS  system.  Some  of  you  out 
there  look  at  least  as  old  as  I  am,  if  not  older,  so  you  probably 
remember  also  information  analysis  centers.  I  cut  my  teeth  on  those  and 
the  dollars  that  went  into  information  analysis  centers  were  diverted  to 
on-line  retrieval  systems.  Why?  Because  there  was  some  conviction  that 
on-line  retrieval  systems  would  provide  better  service  and  reduce  costs 
from  what  was  currently  being  spent  in  the  very  labor-intensive 
information  analysis  centers.  I  hope  there  was  a  conviction  of  that.  I 
hope  that  was  behind  the  decision  and  not  just  becoming  enamoured  with 
technology.  In  any  case,  I  think  that  we  have  to  ask  ourselves  whether 
that  intent  has  truly  been  fulfilled,  or  if  we  have  just  spawned  a  new 
series  of  professions  and  a  new  series  of  specialties  without  really 
improving  service. 

I  would  also  want  to  carry  that  remark  further  to  online  public 
access  catalogs.  If  we  do  the  same  thing  in  the  area  of  online  catalogs 
that  we  did  with  regard  to  online  information  retrieval  systems,  we  will 
essentially  be  creating  card  files  which  the  user  will  not  be  able  to 
open.  The  drawers  will  seem  stuck  shut.  When  they  finally  do  manage  to 
get  the  drawer  pulled  open,  the  cards  will  seem  to  be  stuck  together  and 
the  poor  user  won’t  be  able  to  get  them  apart.  They  aren't  going  to 
accept  that  and  where  are  they  going  to  go  when  they  can't  even  get  the 
drawer  open?  That's  the  analogy  I  see  with  some  of  the  systems  that  we 
are  liable  to  design  unless  we  take  into  account  the  human  element. 


Another  thing  in  terras  of  terminology  I'd  like  to  mention.  I  have 
an  SDI  profile  at  Bell  Labs,  the  same  one  I  had  at  OCLC,  and  "user 
friendly"  was  one  phrase  I  couldn't  use  any  more  because  I  got  so  many 
hits.  Everybody  is  talking  now  about  user  friendly  software.  I  don't 
know  who  would  claim  that  their  software  isn't  user  friendly.  I  also 
think  we've  heard  of  terminal-friendly  users  instead  of  user-f reindly 
terminals.  I  don't  care  whether  it's  friendly  anymore  and  I  think  hack 
to  the  time  when  I  was  in  the  Army  and  going  through  basic  training.  My 
drill  sergeant  certainly  wasn't  user  friendly,  but  he  was  user 
informative.  He  was  firm,  he  was  direct.  He  was  unforgiving,  but  he 
made  it  very  clear  as  to  what  was  required  and  I  knew  what  was  required 
after  a  very  short  time  and  I  did  it.  He  was  instructive.  Maybe  we 
ought  to  stop  talking  about  being  friendly,  which  is  sort  of  like 
graceful  and  forgiving,  and  start  talking  about  being  user  informative. 
How?  By  tracking  what  people  do  and  instructing  them  on  what  to  do. 
That's  where  I  think  the  kind  of  tool  that  I've  just  described  to  you  can 
be  a  great  help. 

One  final  comment  I  want  to  make  and  I  know  this  is  going  to  sound 
like  a  pitch  for  AT&T,  but  it's  not.  Maybe  it's  a  pitch  for  the  Apple 
Computer  Company.  Apple  has  an  ad  for  the  McIntosh  in  which  they  state 
"Computers  won't  really  be  widely  used  until  they're  as  easy  to  use  as 
the  telephone,"  and  then  you  turn  to  the  next  page  and  there's  the 
McIntosh.  I  think  they  have  found  something  that  is  going  to  give  them  a 
great  deal  of  success  on  the  market.  But  isn't  it  amazing  that  they  had 
to  turn  to  icons  and  windows  and  a  mouse  and  all  of  that  to  make  it 
successful.  I  think  we  had  success  right  within  our  grasp  with  the  very 
simple  terminal  and  star-type  network  with  interactive  systems  years  ago. 
We  had  something  that  was  as  easy  to  use  as  the  telephone.  I  think  we 
blew  it  and  I  hope  we  don't  make  the  same  mistake  by  creating  more  and 
more  complexity,  thus  making  it  harder  for  the  end  user  to  get  the 
information  that  they're  seeking. 

QUESTION 

Did  you  run  a  frequency  analysis  on  the  terms  that  users  used  at  the 
National  Library  of  Medicine  most  often? 

DR.  PENNIMAN 

We  didn't  do  it  in  that  system,  but  we  did  do  it  in  another  system  that  I 
monitored  and  analyzed  some  time  before.  Certainly  it's  possible  to  do 
it.  You  have  the  data  there,  you  have  the  first  57  characters  of 
information.  That’s  what’s  captured  and  one  of  our  recommendations  was 
that  they  capture  more  because  when  you  string  terms  together  you  lose 
the  end  of  the  complex  Boolean  search.  I  can't  answer  the  question  for 
NLM,  but  I  can  tell  you  that  people  were  using  the  system  in  a  way  we 
never  expected.  They  were  entering  search  terms  that  were  in  display 
only  fields,  not  searchable  fields  and  from  that  we  were  able  to  conclude 
that  we  had  better  make  some  additional  fields  in  the  records 


24 


searchable.  In  the  case  of  NLM,  the  one  thing  we  did  conclude  was  that 
the  most  likely  entry  to  result  in  a  null  respone  was  a  single  term 
entry.  What  that  says  is  that  the  searcher  is  still  trying  to  find  a 
common  vocabulary  with  the  index  system.  That's  where  I  think  a  great 
deal  of  help  is  needed,  as  well.  So  while  we  didn't  look  at  individual 
terms,  we  looked  at  what  was  most  likely  to  result  in  a  null  response  and 
it  was  individual  terms. 

QUESTION 

Permit  me  then  to  comment  what  this  implies  you  just  said.  All 
computer  systems,  also  those  abroad,  capture  this  type  of  information  as 
a  matter  of  business,  which  means  that  as  we  start  to  search  overseas 
information  centers  and  especially  for  the  Department  of  Defense,  we 
leave  an  indelible  signature  as  to  what  we're  after.  Now,  intelligent 
gateways  can  make  the  searcher  anonymous.  If  the  contract  with  a  foreign 
post  is  from  the  Gateway,  then  those  authorized  to  use  that  foreign  post 
through  the  Gateway  are  not  known  to  the  target  computers.  Which  means 
this  is  another  aspect  why  Gateways  are  probably  something  to  be 
considered  well. 

DR.  PENNIMAN 

Yes,  you've  raised  the  issue  of  privacy  and  I  think  that's  an 
important  one.  In  some  of  cry  previous  publications  I've  addressed  that. 

I  would  hope  chat  the  privacy  issue  doesn't  obscure  the  fact  that  you  can 
analyze  this  data  at  a  gross  level,  you  can  put  counters  on  terms  in  the 
index  of  the  online  system.  It  isn't  necessary  to  know  what  a  specific 
user  is  searching  for  but  it's  interesting  and  informative  and  certainly 
necessary  for  the  marketer  of  these  databases  to  know  what  terms  are 
being  used  most  frequently.  And  that,  I  would  argue,  is  not  an  invasion 
of  anyone's  privacy — to  put  counters  on  the  terras  as  they  appear  in  the 
index.  It's  also  important  to  know  that  terms  are  being  entered  that  do 
not  appear  in  the  index. 

QUESTION 

Could  I  come  in  on  that  on  European  systems  for  a  moment,  please? 
This  a  very  great  issue  you've  expressed  well,  but  it's  a  little 
relieved,  I  think,  in  that  most  European  host  systems  have  a  clause  in 
their  contract  that  explicitly  prohibits  them  from  storing  any  data  about 
your  usage  for  any  longer  than  is  necessary  for  serving  your  request. 

For  example,  I've  got  to  keep  a  record  of  the  documents  you  want  printed 
off-line  so  I  printed  them  off-line,  all  for  billing  purposes  and  this  is 
actually  a  legal  requirement  in  most  European  countries.  So  I  hope  that 
makes  you  feel  a  little  easier  about  it. 


DR.  PENNIMAN 


That's  exactly  the  point  that  I'tn  concerned  about,  though,  that 
those  clauses  in  the  contract  may  preclude  individuals  from  actually 
capturing  data  at  the  gross  level  that  can  be  used  for  system  refinement. 
I  don't  think  contracts  should  preclude  such  studies.  I  know  that  I  had 
a  running  debate  with  one  of  the  leaders  of  a  commercial  vendor  about 
whether  or  not  their  system  captured  the  data  online.  Clearly  I  know  of 
no  operating  system  that  doesn't  keep  a  log  in  order  for  recovery 
purposes. 

COMMENTS 

I'm  from  NLM  and  I  think  I  probably  should  point  out  that  the  NLM  does 
destroy  the  logs  after  a  very  short  period  of  tine — a  matter  of  days  or  a 
very  few  weeks.  And  I  would  guess  that  probably  in  your  case  you 
probably  had  special  permission  from  the  access  codes  to  allow  NLM  to 
record  the  data. 

DR.  PENNIMAN 

The  data  that  was  provided  to  us  had  the  access  codes  encrypted  so 
we  had  no  knowledge  of  the  individuals  doing  the  searches,  but  they  were 
encrypted  in  such  a  way  that  we  could  group  them  by  individual  access 
codes.  They  then  provided  us  with  a  key  to  the  encryption  that  told  us 
the  type  of  organization  from  which  that  code  came,  but  not  the 
individual  or  not  the  specific  organization  from  which  the  searcher 
came. 

QUESTION 

Do  you  know  whether  more  than  one  individual  had  access  to  a 
particular  access  code? 

DR.  PENNIMAN 

We  had  no  way  of  knowing  whether  two  people  were  sharing  the  same 
code,  and  that,  of  course,  is  a  confounding  point  in  the  data;  however, 
even  if  we  were  able  to  look  at  names  we  would  have  no  indication  that  it 
didn't  represent  two  or  more  people.  But  I  still  see  that  as  no  reason 
to  throw  this  out  as  a  way  of  measuring  what's  going  on.  One  confirming 
point  is  that  Mike  Cooper  got  similar  data  because  he  was  there  as  an 
intern  and  got  a  release.  But  yes,  there  was  authorized  release  for  both 
of  these  studies. 


QUESTION 


John  Lawson  from  NASA.  We're  going  through  considerable  redesign  on 
our  online  system.  Looking  at  beginning  users,  intermediate  users,  and 
expert  users,  can  you  profile  what  in  general  does  a  beginning  user  do? 
What  does  an  intermediate  user  do?  What  does  an  expert  user  do  or  not 
do? 

DR.  PENNIMAN 

Yes,  but  it's  more  complex  in  terms  of  profile.  Let  me  show  you  an 
example  of  what  you  can  do  rather  than  try  to  answer  specifically  because 
it's  a  fairly  complicated  issue.  I  have  another  transparency  (Figure  7) 
that  I  pulled  out  for  the  sake  of  time.  I  showed  you  that  we  assigned 
codes  to  every  one  of  the  transactions.  We  were  able  to  look  at 
sequences  of  transactions  for  frequent,  moderate,  and  infrequent  users 
and  then  compare  which  things  occur  across  all  three  or  two  of  the  three, 
and  which  kind  of  patterns  only  occur  in  a  single-user  category.  We  were 
able  from  that  to  determine,  as  an  example,  that  the  infrequent  user  is 
likely  to  sit  there  and  hit  the  display  button  and  go  through  repeated 
displays,  one  after  the  other.  The  frequent  user  on  the  other  hand  is 
not  likely  to  do  that  but  will  batch  off  the  print  out  or  just  look  at  a 
few  and  then  go  on.  That's  one  example  of  what  the  infrequent  user  does. 
I  would  argue  that  when  you  see  that  happening  in  the  system,  there  ought 
to  be  a  prompt  built  in  that  gives  the  individual  an  option  to  learn 
about  the  off-line  print  command  because  they  may  not  know  about  it, 
since  they're  continually  hitting  the  display  command. 

QUESTION 

Why  doesn't  the  frequent  user  use  all  the  advanced  features?  I  know 
that's  not  within  your  study,  but  that  would  seem  to  be  a  natural. 

DR.  PENNIMAN 

Right,  and  that's  a  natural  question  that  comes  out  of  this  that  I 
think  NLM  should  then  go  back  and  pursue.  They  should  look  at  the 
training  program  and  how  features  are  taught.  They  should  grab  some  of 
their  frequent  users,  or  their  known  heavy  users,  and  talk  to  them. 

COMMENT 

We  need  to  know  your  controls.  Otherwise  we  could  be  going  with 
something  without  the  right  ending. 

DR.  PENNIMAN 

Yes,  there  are  shortcomings  in  this  approach,  but  my  major  point  is 
that  this  gives  you  some  signs,  some  pointers  of  where  you  ought  to  look. 
Just  as  the  fact  that  we  ended  up  with  all  those  spurious  IDs  was  a  very 
strong  pointer  that  there  were  log-on  problems.  Now  the  problem  is 
accepted  as  real. 


QUESTION 


I  sense  a  danger  and  have  a  couple  of  years  about  learning  how 
different  classes  of  users  actually  use  today's  existing  systems,  whether 
novice,  intermediate,  occasional  user,  or  the  expert  experienced  user. 

It  tends  to  make  some  designers  think  that  we  have  to  create  two  or  three 
levels  of  interface  to  be  entered  at  the  beginning  rather  than  looking 
at,  as  I  know  you  believe  and  have  written  about,  dynamic  adaptive 
interfaces.  Any  given  user,  and  I  include  myself  in  this,  on  a  given 
system  within  the  same  session  is  going  to  have  varying  ability  from 
expert  to  novice,  depending  on  how  much  coffee  he's  had  or  how  much  sleep 
he's  had  the  night  before  or  what  other  systems  he's  used  in  the  previous 
24  hours.  Experience  is  a  rubber  band  continuum  for  any  searcher.  We 
should  keep  that  in  mind  and  not  go  too  far  with  what  we  learn  about 
existing  pre-defined  classes  of  users  on  today's  rather  unadvanced, 
unadaptive  systems. 

DR.  PENNIMAN 

That's  a  good  point  and  it  relates  to  something  Carol  said  about 
having  a  bicycle  with  training  wheels  you  can't  take  off.  On  the  other 
hand,  it  would  be  nice  to  have  a  bicycle  which  when  it  started  to  tip 
over,  all  of  a  sudden  the  training  wheels  reappeared.  And  that  I  think 
we  can  do  with  adaptive  prompting. 

QUESTION 

We  are  capturing  the  log  and  sequences  for  different  reasons,  mainly 
to  trace  a  possible  unauthorized  attempt  at  access  and  if  we  do  so  by 
looking  at  by  what  means  the  people  are  coming  in  (short  of  having  a  9-11 
box,  which  you  can  get,  which  can  tell  you  from  which  telephone  the  call 
is  placed)  and  if  you  create  an  inverted  table  on  such  log-on  attempts 
you'll  very  quickly  find  that  about  15  percent  of  the  log-ons  are  not  by 
authorized  users.  And  you  may  wish  to  know,  as  I  am  sure  you  do  know, 
that  Moscow  has  access  through  the  International  Institute  for  Applied 
Systems  Analysis  is  Austria  to  all  international  networks,  which  means  a 
person  in  Moscow  today  can  search  any  one  of  your  public  systems. 

DR.  PENNIMAN 

Viktor,  I  spent  a  year  at  that  Institute  in  Austria  and  I  know  what 
they're  trying  to  do  in  terms  of  becoming  a  gateway  between  the  East  and 
the  West,  and  you're  right.  They  are  trying  to  provide  that  interchange 
of  information.  The  thing  I’m  concerned  about  is  that  you  continue  to 
raise  this  specter  in  terms  of  privacy  and  security.  I  think  it  can  be 
addressed  and  I  think  it  can  be  addressed  in  a  rational  way.  I  was  going 
to  mention  that  a  paper  by  Wayne  Dominick  and  me  in  Information 
Processing  Management,  January  1980,  has  that  privacy  issue  addressed  in 
a  fairly  structured  and  rational  way.  I  hope  that  you  will  consider  that 
rather  than  whether  detente  is  at  its  peak  or  declining  at  this  stage 
regarding  our  exchange  of  technical  information  with  Eastern  bloc 
Countries. 

Thank  you. 


28 


REFERENCES 


Penniman,  W.D.,  "Modeling  and  Evaluation  of  Online  User  Behavior, 
Proceedings  of  the  ASIS  Annual  Meeting,  Volume  19,  1982. 

Penniman,  W.D.,  “A  Methodology  for  Evaluating  Interactive  System 
Usage,"  Sigchi  Bulletin,  Vol.  15,4,  April  1984,  pp.  6-11. 

Penniman,  W.D.,  and  Dominick,  W.D.,  "Monitoring  and  Evaluation  of 
Online  Information  System  Usage,"  Information  Processing  & 
Management,  Vol.  16,  1,  January  1980,  pp.  17-35. 


FIGURE  1 


TOTAL  SAMPLE  SUMMARY 


TOTAL 

AVG  PER  USER 

SESSIONS 

39,330 

25.9 

TRANSACTIONS 

2,104,977 

1385.8 

TIME  (HRS:MIN) 

7274:12 

4:47 

TRANS /SESSION 

- 

53.7 

MINUTES /SESSION 

- 

11.06 

TRANS/MIN 


4.8 


FIGURE  2 


SUBSAMPLE  CHARACTERISTICS 


FREQUENT 

MODERATE 

INFREQUEMR 

USERS 

14 

46 

149 

SESSIONS 

1306 

1223 

1044 

TRANSACTION  PAIRS 

41208 

40333 

25464 

TIME  (HR: MIN) 

296:11 

313:59 

210:46 

TRANS,  PAIRS/SESSION 

31.6 

33 

24.4 

TRANS,  PAIRS /MINUTE 

2.31 

2.14 

2.01 

SESSIONS/USER 

93.3 

26.6 

7.0 

MINUTES/SESSION 

13.37 

15.24 

12.07 

FIGURE  3 


ACTIVITY  CODE  MAPPING 

DESCRIPTION 

NULL 

STORESEARCH 

ERROR 

NEUTRAL 

BEGIN 

DICTIONARY 

TERM 

ADVANCED  TERM 
BOOLEAN 
DISPLAY 
END 

OFF  SEARCH 


PRINT  OFF-LINE 


TERM  ADVANCED  BOOLEAN 
SEARCH  TERM  SEARCH  SEARCH 


F  =  Frequent 
M  =  Moderate 
I  =  Infrequent 


FIGURE  5 


SUMMARY  OF  RESULTS 

o  LOGON  PROBLEMS 
o  USE  OF  DISPLAY  COMMAND 
o  80/20  RULE 

o  SPECIFICITY  DECREASES  FREQUENCY 
o  FREQUENT  SEQUENCES  SIMILAR  ACROSS  GROUPS 
o  USE  OF  CERTAIN  COMMANDS  DISTINGUISH  GROUPS 
o  COMMAND  PAIRS  DISTINGUISH  GROUPS 
o  NOVICES  SEARCH  MORE  SLOVLY 
o  ERRORS  OCCUR  ACROSS  ALL  GROUPS 
o  FREQUENT  USERS  EMPLOY  MORE  COMMANDS  AND  TIME 
o  FREQUENT  USERS  EMPLOY  MORE  COMPLEX  STRATEGIES 
o  LONGER  STRINGS  PROVIDE  MORE  UNIQUE  STRINGS 
o  LONG  STRINGS  NOT  PREDICTED  FROM  SHORT  STRINGS 
o  INFREQUENT  USERS  EMPLOY  LONG  DISPLAY  SEQUENCES 
o  FREQUENT  USERS  DID  NOT  USE  ADVANCED  SELECT  COMMANDS 


FIGURE  6 


CONCLUSIONS 


METHODOLOGY  ALLOUS  FOR  COMPARISON 
ACROSS  SYSTEMS  AND  DATA  BASES 


METHODOLOGY  ALLOWS  TESTING  OF 
PREVIOUS  RESULTS 


METHODOLOGY  COULD  STAND  FURTHER 
REFINEMENT 


EXTENSION  OF  METHODOLOGY  TO  OPACS 
IS  PROMISING 


FIGURE  7 


MOST 

COMPARISON  OF 
FREQUENT  PATTERNS 

FREQUENT 

MODERATE 

1-1 -1-1** 

1-1 -1-7** 

1-1-1 -8** 

1-1 -1-8** 

6-1-1 -1** 

1-8-1 -8* 

1-1 -1-6 

1 -1-8-1** 

1-1 -6-8 

1-8-1 -1** 

1-1 -8-1** 

5-1-1 -1** 

1-8-1 -1** 

5— 5— 5— 5* 

5-1-7-!** 

8-7-8-7 

CO 

1 

^>1 

1 

1 

^4 

6-1 -1-1** 

7-6-8-1 

5-5-1-1 

*  OCCURS  IN  ONE  OTHER  LIST 

**  OCCURS  IN  TWO  OTHER  LISTS 


INFREQUENT 

1-1 -7-1 ** 
8-8-8-8 
1-1 -1-8** 
5— 5— 5 —5  * 

7 -7-8-7** 
7-S-7-8* 

7— 8— 7  — 7  ** 

5- 1-1 -1** 

6- 1 -1-1** 
5-7-5-7 


36 


Hilary  D»  Burton 


Common  command  languages,  front-ends,  uniformizers,  interaction  languages, 
searchers  workbench,  user-cordial  interfaces  -  all  of  these  phrases  refer  to 
attempts  to  develop  tools  to  improve  human  utilization  of  computer-based 
information  systems.  For  the  most  part,  these  efforts  have  concentrated  on 
facilitating  use  of  bibliographic  systems  although  several  of  the  projects 
provide  models  which  could  be  useful  in  a  more  generic  approach. 

The  various  projects  have  approached  this  interface  or  interaction  of  user 
and  system  in  a  variety  of  ways  although  primarily  with  the  same  objective: 
to  alleviate  the  difficulties  encountered  by  a  user  who  must  deal  with  an 
ever-increasing,  heterogeneous  collection  of  on-line  databases.  Multiple 
systems  offer  multiple  databases.  Different  systems  structure  their 
retrieval  and  input/output  processing  differently.  The  same  system  will  not 
be  able  to  treat  all  files  it  processes  identically  except  insofar  as  they 
have  common  elements  and  this  is  quite  often  not  the  case.  Furthermore,  a 
given  database  may  change  over  time  and  thus  exist  within  a  system  in 
multiple  versions.  A  single  user  may  have  to  contend  with  intra-  and  inter- 
data  base  differences  as  well  intra-  and  inter-system  variation. 

And,  now,  with  the  growing  availability  of  tools  like  the  intelligent  gateway 
which  make  access  to  widely  distributed  systems  easily  available  at  the 
user's  discretion,  another  layer  of  variability  is  added  -  the  intra-network 
layer.  This  additional  complexity  is  probably  the  factor  responsible  for  the 
change  in  attitude  that  the  need  to  develop  effective  interfaces  has  gone 
from  being  a  theroetical  goal  in  many  system  designers*  minds  to  being  a 
practical  necessity.  And  it  is  a  necessity  since  the  three  factors  mentioned 
by  Marcus  several  years  ago  are  still  just  as  true,  (l) 

1 .  established  character  of  existing  systems 

2.  not  easy  to  modify  operational  systems  to  improve  or 

3.  standardization  is  difficult  -  especially  if  we’re  uncertain  about 
what  constitutes  a  good  standard. 


•Work  performed  under  the  auspices  of  the  U.  S.  Department  of  Energy  by  the 
Lawrence  Livermore  National  Laboratory  under  contract  number  W-7405-ENG-48. 


-  -  v\vv*  >■»  u  VJfJf  i>M  M  MV?  7,r.»  W  ■fv»»'-,’Jl  iP.^jy.’Wj  w.  >■.'  »y  wy 


The  proposed  solutions  have  proceeded  along  several  courses:  in  some  cases 
an  educational  approach  was  taken  such  as  the  work  by  Meadow  and  others  on 
IIDA.  Another  approach  developed  out  of  vocabulary  analysis  and  switching 
research.  One  example  of  this  is  the  program  at  the  National  Library  of 
Medicine.  Yet  another  type  of  activity  has  been  concerned  with 
standardization  of  command  language  per  se.  The  most  comprehensive  and 
successful  effort  to  date  being  the  Euronet  Common  Command  Language  program.  ;2) 

However,  more  commonly,  efforts  to  develop  solutions  to  the  interface  problem 
have  involved  a  more  comprehensive  approach.  CSIN,  CONIT,  FRED,  OL’  SAM,  and 
the  National  Bureau  of  Standards  work  on  an  intermediary  processor  are 
examples  of  combined  tactic  approaches.  They  are  of  particular  interest  to 
us  at  Lawrence  Livermore  National  Laboratory  because  they  have  the  potential 
to  interface  to  (or,  in  some  cases,  already  incorporate  aspects  of)  gateway 
technology.  Eash  has  a  slightly  different  focus,  for  example,  CSIN's  strong 
support  of  chemical  system  searching  or  01*  Sam's  evolution  and  subsequent 
incorporation  in  the  SCIMATE  package  offered  by  ISI. 

Before  reviewing  these  particular  systems,  it  is  worthwhile  to  repeat  remarks 
made  by  John  Bennett  of  IBM  in  1972  when  on-line  information  services  were 
still  in  their  infancy.  His  foresight  is  impressive:  "There  are  several 
requirements  for  further  development  of  the  emerging  user-interface 
technology.  First,  it  is  imperative  to  cut  through  the  mass  of  inessential, 
application-specific  detail  and  to  overcome  confusion  in  terminology  so  that 
the  basic  similarity  of  user  services  required  in  many  applications  becomes 
clear.  Second,  observers  of  the  computer  scene  have  decried  the  tendency  of 
software  designers  to  produce  each  system  as  if  others  did  not  exist.  To  be 
successful  in  projecting  interactive  facilities  into  new  applications, 
designers  must  learn  to  build  on  the  work  of  others  and  stop  dissipation  of 
resources  on  unnecessary  duplicated  effort.  Third,  adoption  of  a  tool-design 
approach  will  make  it  obvious  where  human-engineering  skills  and  computer 
assisted  instruction  experience  can  help  provide  improved  interface  languages 
and  training  techniques"  (?) 

With  Bennett's  recommendation  in  mind,  we  would  like  to  review  the  various 
relevant  projects  we  have  identified  and  then  discuss  our  efforts  to 
integrate  such  work  with  our  TIS  intelligent  gateway. 

It  is  interesting  to  note,  as  Marcus  has,  that  there  has  been  a  shift  in 
attitude  from  asking  whether  such  intelligent  interfacing  can  be  done  to 
asking  whether  such  interfacing  will  be  as  effective  as  a  skilled  searcher 
using  the  selected  system(a)  and  its  dialog,  help  features,  etc.  directly.  (  4) 
Much  of  his  later  work  has  been  directed  at  collecting  just  such  hard  figures 
concerning  use  of  the  CONIT  system  at  MIT. (5)  Carol  Fenichel  made  this  point 
this  morning  when  she  said  it  was  more  complicated,  in  many  respects,  to 
learn  the  new  interfaces  than  to  learn  the  actual  system.  And  what  is  the 
cost  and  is  there  degradation  of  search  quality. 


. 'i 


$ 

•.<V 

-  *  .  I 

mk» 


Growing  out  of  work  begun  in  Project  Intrex  in  the  1960's  CONIT  provides  an 
intelligent  interface  to  Lockheed,  BRS,  SDC,  and  the  National  Library  of 
Medicine  systems.  Requests,  in  near-natural  language  are  transformed  into 
forms  appropriate  to  the  system  being  used  and  response  from  the  host  system 
are  also  translated  giving  the  user  a  view  of  a  "virtual  system"  rather  than 
the  disparate  systems  he  is  actually  using. 


CONIT  software  was  also  used  by  Meadow  in  his  Individualized  Instruction  for 
Data  Access  (IIDA)(6)  project  which  involved  using  the  computer  as  a  tool  to 
train  would-be  searchers  and  by  the  Franklin  Research  Institute.  The 
Franklin  Institute  -  Drexel  University  effort  was  a  direct  predecessor  of  01' 
Sam  (Online  Database  Search  Assistance  Machine)  (?)  a  microcomputer-based 
system  which  handles  logical  multiplexing,  access  protocol  management, 
command  and  response  translation,  search  strategy  and  response  storage,  user 
assistance,  and  search  activity  logging  into  RECON/DIALOG  and  0RBIT/E1HILL 
based  systems. 

David  Toliver,  who  worked  on  01' Sam  (which  no  longer  is  markable)  is  now 
managing  the  group  at  the  Institute  for  Scientific  Information  which  has 
SCIMATE  as  its  primary  product.  SCIMATE,  which  is  variously  advertised  as  a 
universal  searcher  and  personal  file  manager,  is  available  commercially  for 
CP/M  and  MS/DOS  personal  computers. (8)  It  accesses  SDC,  BRS,  Lockheed,  and 
the  National  Library  of  Medicine  systems  and  incorporates  much  of  the  design 
philosophy  of  01'  Sam. 

The  Chemical  Abstracts  Searching  Terminal, (9)  an  early  product  of  the  CSIN 
work,  of  the  Computer  Corporation  of  America  and  Userkit,  now  Userlink, ( 10) 
by  Williams  and  Nivin,  are  also  commercially  available,  and  can  provide 
subsets  of  the  features  available  in  SCIMATE. 

Also  microcomputer-based  but  not  commercially  available  is  the  Searcher's 
Workbench  developed  by  Scott  Preece  and  Martha  Williams  at  the  University  of 
Illinois.  (11)  Their  prototype  system,  implemented  on  an  Alpha  Microsystems 
microcomputer,  accesses  five  databases  on  DIALOG  and  BRS  and  can  communicate 
with  the  Vocabulary  Switching  System  at  Battelle  and  the  automatic  Data  Base 
Selector  at  the  University  of  Illinois.  The  latter  features  allow  a  user  to 
develop  more  comprehensive  and  appropriate  search  queries  as  well  as  identify 
the  relevant  databases  to  query. 

Vocabulary  analysis  and  translation  is  a  key  feature  of  the  work  done  at  the 
National  Library  of  Medicine  under  Charles  Goldstein  (12)  and  Tamas 
Doszkocs(l3)  Goldstein  et  al.  have  worked  on  a  menu-based  user  cordial 
interface  through  which  naive  users  can  search  the  on-line  public  access 
catalog  (CATLINE)  using  natural  language  which  is  translated  into 
appropriately  formatted,  controlled  headings  for  user  review  and  selection. 
Parallel  to  this  effort  is  the  CITE  (Current  Information  Transfer  in  English) 
project  which  features  a  natural  language  interface  to  MEDLINE  and  provides 
for  ranked  output  and  relevance  feedback  for  use  in  query 
refinement /modification.  Both  projects  represent  research  which  could  be 
extended  beyond  the  medical  information  environment. 

Representing  a  more  general  approach  is  the  work  done  by  Jakobson  and  others 
at  the  GTE  Laboratories.  (14)  FRED,  A  Front  End  for  Databases  is  a 
combination  of  hardware  and  software  simulated  in  Interlisp  on  an  IBM  3033 
which  provides  access  to  multiple  systems  with  one  log-on,  selects  target 
databases  on  the  basis  of  user's  query,  translates  user's  natural  language 
commands  into  appropriate  system  commands  and  then  converts  host  responses  to 
natural  language  for  the  user,  provides  processing  information  (host  service 
information,  costs,  etc.)  and  traffic  monitoring  and  billing.  Future  plans 
include  iuplementing  FRED  as  a  special  purpose  database  machine. 


40 


41 


K»A! 


REFERENCES 


r_-j< -p: 


1.  R.  S.  Marcus,  "User  Assistance  in  Bibliographic  Retrieval  Networks 
Through  a  Computer  Intermediary,"  IEEE  Trans.  Systems,  Man,  and 
Cybernetics,  SMC-12  (2)  pp.  116-133,  1982. 

2.  A.  E.  Bennett,  "Development  of  the  EURONET-DIANE  Common  Command 
Language,"  Proc.  3rd  Int'l  Online  Information  Meeting,  pp.  95-98,  Learned 
Information,  1979* 

3.  J.  L.  Bennett,  "The  User  Interface  in  Interactive  Systems,"  Annual  Review 
of  Information  Science  and  Technology,  Vol.  7,  pp.  189,  1972. 

4.  (See  (1)  above.) 

5.  R.  S.  Marcus  and  J.  F.  Reintjes,  "A  Translating  Computer  Interface  for 
End-User  Operation  of  Heterogeneous  Retrieval  Systems  11  Evaluations," 

J.  American  Society  Information  Science,  32  (4)  pp.  304-317,  1982. 

R.  S.  Marcus,  "An  Experimental  Comparison  of  the  Effectiveness  of 
Computers  and  Humans  as  Search  Intermediaries,"  J.  American  Society  for 
Information  Science,  34  (6)  pp.  381-404,  1983. 

6.  D.  E.  Toliver,  "A  Program  for  Machine-Mediated  Searching, "Information 
Processing  and  Management  17  (2)  pp.  61-68,  1 981 . 

C.  T.  Meadow,  D.  E.  Toliver,  and  J.  V.  Edelmann,  "A  Technique  for  Machine 
Assistance  to  Online  Searchers,"  Proc.  ASIS  Annual  Meeting,  1 978,  pp. 
222-225.  C.  T.  Meadow,  "The  Computer  as  a  Search  Intermediary  (IIDA)," 
Online,  3  (3)  pp.  54-59,  1979- 

7.  D.  E.  Toliver,  "OL'SAM,  An  Intelligent  Front-End  for  Bibliographic 
Information  Retrieval,"  Information  Technology  and  Libraries,  Dec.  1982, 
pp.  317-326. 

8.  E.  Garfield,  "Current  Comments,"  Current  Contents,  12  pp.  5-12,  1983. 

9«  A.  J.  Horowitz  and  R.  Bergman,  "Improved  Productivity  in  Multi-System 
Information  Retrieval  Through  the  CSIN  Intelligent  Terminal,"  Proc.  2nd 
National  Online  Information  Meeting,  pp.  287-292,  Learned  Information, 
1981. 

10.  P.  W.  Williams,  "A  New  Device  to  Simplify  Online  Searching  and  Reduce 
Costs,"  Proc.  2nd  National  Information  Meeting,  pp.  503-514,  Learned 
Information,  1981. 

11.  M.  E.  Williams  and  S.  E.  Preece,  "  A  Mini-Transparent  System  Using  An 
Alpha  Microprocessor,"  Proc.  2nd  National  Online  Information  Meeting,  pp. 
499-502,  Learned  Information,  1931. 

12.  C.  M.  Goldstein  and  N.  H.  Ford,  "The  User-Cordial  Interface,"  Online 
Review,  2  (3)  pp.  269-275,  1978. 


J 


42 


T.  E.  Doszkocs  and  B.  A.  Rapp,  "Searching  Medline  in  English,  A  Prototy 
User  Interface  with  Natural  Language  Query,  Ranked  Output,  and  Relevanc 
Feedback,"  Proc.  ASIS  16,  pp.  131-139,  1979* 

T.  E.  Doszkocs  "Automatic  Vocabulary  Mapping  in  Online  Searching,”  Int. 
Classif .  10  (2)  pp.  78-83,  1 983 • 

M.  I.  Crystal  and  G.  E.  Jakobson,  "FRED,  A  Front  End  for  Databases," 
ONLINE  6,  (5)  pp*  17-30,  Sept.,  1982. 

R.  Rosenthal  and  B.  D.  Lucas,  "The  Design  and  Implementation  of  the 
National  Bureau  of  Standards  Network  Access  Machine  (NAM),"  NBS  Special 
Publication  500-35,  June,  1978. 


S.  Treu,  "Uniformity  in  User  Computer  Interaction  Languages,  A  Compromise 
Solution,"  Int.  Journal  Man-Machine  Studies,  16  pp.  183-210,  1982. 


INTERMEDIARY  SYSTEMS  FOR  INFORMATION  RETRIEVAL 


1.  Background:.  Intermediary  System  Research 


In  the  past  few  years  developments  have  continued  to  be  made  in  information 
science  and  technology.  Perhaps  the  most  exciting  prospects  are  those  where 
there  appears  to  be  emerging  a  mutually  supportive  cross  fertilization  in  which 
the  essential  elements  of  the  use  of  the  new  technology  are  captured  in  models 
of  information  processes  which  are  then  elaborated  and  used  to  fuel  the 
development  of  new  techniques.  We  have  investigated  one  such  line  of 
synergistic  interaction  involving  the  modeling  of  retrieval  processes  leading  to 
better  understanding  and  more  effective  analysis  procedures  and,  in  turn,  more 
rational  and  effective  information  processing  techniques  themselves. 

Modern  interactive  computerized  information  retrieval  systems  have  continued  to 
increase  their  utility  in  terms  of  expanded  size  and  comprehensiveness  of 
database  coverage  and  added  functionality  of  retrieval  operations.  A  core 
element  of  this  development  has  been  the  appreciation  that  the  computer  systems 
need  to  provide  more  than  just  some  basic  tools;  they  need  further  to  help  and 
assist  the  users  in  making  easy  and  effective  utilization  of  these  tools. 

We  and  others  have  pioneered  in  research  into  this  line  of  development  by  way  of 
the  mechanism  known  as  the  intermediary  system.  The  intermediary  system  serves 
as  an  agent  in  helping  the  user  to  access  and  operate  other  information  systems. 
Our  investigations  have  centered  on  the  use  of  such  an  intermediary  system  to 
provide  easier  and  more  effective  operation  of  multiple  and  heterogeneous 
bibliographic  information  retrieval  systems. 

To  evaluate  this  concept  we  implemented  and  tested  a  series  of  experimental 
intermediary  systems  under  the  generic  name  CONIT  (standing  for  COnnector  for 
Networked  Information  Transfer).  CONIT  systems  allow  computer-inexperienced 
users  to  access  and  operate  three  different  retrieval  systems:  NLM  ELHILL 
(MEDLINE),  SDC  ORBIT,  and  DIALOG.  CONIT  performs  the  following  functions:  (1) 
converses  with  users  in  a  simple,  common  language  which  is  self-instructional; 
(2)  assists  the  users  in  identifying  appropriate  databases  and  formulating 
search  statements;  (3)  automatically  connects  to,  and  performs  the  login 
protocol  for,  a  system  with  the  selected  database  and  translates  the  users' 
search  requests  into  commands  in  the  language  of  the  connected  system;  (4) 
reports  the  results  of  the  search  back  to  the  users;  and  (3)  assists  the  users 
in  making  additional  requests  of  the  remote  systems  to  further  satisfy  their 
informational  needs. 

Controlled  experiments  have  been  conducted  (MARC83b)  to  compare  the 
effectiveness  of  the  CONIT  intermediary  with  that  of  human  expert  intermediary 
search  specialists.  Some  16  end  users,  none  of  whom  had  previously  operated 
either  CONIT  or  any  of  the  three  connected  retrieval  systems,  performed  searches 
on  20  different  topics  using  CONIT  with  no  assistance  other  than  that  provided 
by  CONIT  itself  (except  to  recover  from  cooputer/software  bugs).  These  same 

45 


users  also  performed  searches  on  the  same  topics  with  the  help  of  human  expert 
intermediaries  who  searched  using  the  retrieval  systems  directly.  Sometimes 
CONIT  and  sometimes  the  human  expert  were  clearly  superior  in  terms  of  such 
parameters  as  recall  and  search  time.  In  general,  however,  users  searching 
alone  with  CONIT  achieved  somewhat  higher  online  recall  at  the  expense  of  longer 
session  times.  Furthermore,  users  invariably  preferred  to  do  their  own 
searching  with  CONIT  as  opposed  to  undertaking  what  they  perceived  as  the 
difficulties  of  making  their  problems  understood  by  human  intermediary  agents. 

The  results  of  our  experiments  have  been  very  encouraging  and  we  have  performed 
additional  analyses  (MAIC81a,  MARC82,  MARC83b)  that  indicate  that  the 
intermediary  solution  could  be  highly  cost  effective  in  a  number  of  contexts. 

The  positive  results  of  our  research  have  helped  spawn  a  whole  new  field  of 
intermediary  system  development  which  we  have  reviewed  in  (MARC83b). 

While  these  new  developments  are  starting  to  have  a  beneficial  impact  on 
retrieval  system  use,  especially  by  the  end  user,  there  has  been  one  aspect  of 
our  investigations  with  CONIT  which  has  not  yet  been  fully  realized  within  these 
developments.  That  aspect,  which  has  important  implications  for  research  in 
information  science  theory  as  well  as  applications,  concerns  the  modeling  of  the 
search  process  and  the  incorporation  of  parts  of  the  model  as  enhanced  search 
techniques  provided  for  assisting  the  user  by  the  computer.  ”  ~ 

At  this  point  it  is  worthwhile  to  enumerate  those  several  principles  which  we 
believe  have  contributed  to  the  relative  success  of  our  experimental 
intermediary  systems.  In  particular,  we  list  the  following: 

(1)  The  heterogeneity  of  existing  systems  is  replaced  by  the  commonality 
of  the  virtual  system. 

(2)  The  complexity  of  current  system/user  interfaces  is  replaced  by  a 
simpler  and  easier- to-use  interface. 

(3)  Effective  instruction  is  given  by  the  computer  to  assist  the  user. 

(4)  Relatively  few  basic  retrieval  operations,  of  the  many  retrieval 
functions  available  on  existing  systems,  are  provided;  but  these  satisfy  most 
needs  of  most  end  users. 

(5)  Even  among  the  few  basic  retrieval  functions,  beginning  end  users 
initially  are  taught  still  fewer  core  functions;  additional  capabilities  may  be 
taught  as  needed. 

(6)  Inexperienced  users  can  take  advantage  of  relatively  simple  methods 
for  developing  search  strategies  that  are  effective  across  heterogeneous 
databases. 

These  principles  are  backed  up  by  other  techniques  and  methodologies.  One 
important  technique  is  a  simplified  command/argument  language  approach  that 
incorporates  elements  of  natural-language  and  menu  approaches.  Also,  computer 
instruction  is  dynamic;  that  is,  it  is  given  according  to  the  context  of  the 
search  process.  To  a  considerable  extent  the  intermediary  acts  as  an 
intelligent  agent  for  the  user  by,  for  example,  automatically  performing 
connection  and  login  protocols  and  keeping  track  of  all  searches  so  that  they 
can  be  regenerated  or  repeated  in  other  databases  as  needed. 

46 


An  important  methodological  approach  is  to  develop  appropriate  models  of 
the  processes  involved.  Thus  a  model  of  interacting  independent  processes  as 
message  transmitters,  interpreters,  and  responders  led  to  a  translating-table, 
production-rule  based  interpreter  as  a  software  vehicle  for  mediating  between 
the  heterogeneous  computer  and  human  systems. 

Perhaps  most  importantly,  as  indicated  above,  we  developed  early  in  the 
project  a  preliminary,  informal  model  of  the  search  process  to  help  prioritize 
the  functions  most  needed  to  be  incorporated  into  the  common,  virtual 
intermediary  retrieval  system.  This  model  has  led  to  a  technique  for  effective 
searching  by  inexperienced  end  users  across  databases  with  heterogeneous  indexes 
based  on  a  natural-language,  content-word-stem  Boolean  searching  of  free-and 
controlled-vocabulary  subject  indexes. 


2.  Current  Research  Directions  and  Recent  Progress 

Having  demonstrated  in  our  early  investigations  the  potential  of  the 
intermediary  assistant  approach,  we  began  in  1981  to  emphasize  a  particular 
direction  to  our  research:  an  investigation  of  the  possibility  of  developing  a 
comprehensive  and  formalized  model  of  the  search  process  which  could  be  employed 
to  design  truly  intelligent  search-agent  intermediary  computer  systems. 

Our  decision  to  focus  on  the  modeling  and  intelligent-agent  aspects  of  our 
research  derived  from  the  conclusion  that  it  was  these  aspects  that  contributed 
significantly  to  the  success  of  our  experimental  results  and,  more  importantly, 
that  these  aspects  were  key  to  the  possibility  of  major  further  advances  in  the 
science  and  techniques  of  information  transfer.  From  the  scientific  viewpoint 
we  need  the  insight  provided  by  models  to  forward  our  understanding  of  the 
search  process  and  to  enable  a  theoretical  analysis  of  the  comparative 
effectiveness  of  current  and  prospective  retrieval  procedures.  From  the 
viewpoint  of  the  development  of  retrieval  techniques  it  can  be  argued  that  while 
obvious  incremental  improvements  are  foreseeable  in  the  way  of  greater 
comprehensiveness  and  user-friendliness  for  computer  assistance,  there  must  be 
some  way  to  incorporate  highly  intelligent  agents  to  make  another  quantum  leap 
in  improved  retrieval  performance. 

Our  general  plan  has  been  to  elaborate  the  retrieval  models  while  enhancing 
the  experimental  intermediary  systems  to  test  the  efficacy  of  the  models  for  (1) 
capturing  the  essence  of  the  search  process  and  (2)  providing  for  improved 
retrieval  techniques  through  expert  computerized  search  assistance.  Progress 
has  been  achieved  in  executing  the  general  plan  along  four  lines:  (1)  models 
for  the  general  retrieval  process  and  for  particular  measures  of  indexing  and 
search  effectiveness;  (2)  enhancement  of  computerized  search  assistance;  (3) 
testing  of  enhancements;  and  (4)  extension  of  intermediary  assistance  to 
generalized  information  processing. 


2.1  Modeling  Progress 


A  general  model  of  the  search  process  has  been  developed.  As  summarized  in 
a  recent  paper  [MARC83a],  there  are  five  components  of  this  model  corresponding 
to  stages  of  the  search  process:  (1)  formalized  problem  representation  (FPR) 


including  a  Boolean-structured  topic  representation  (BTR)  and  associated  problem 
aspects  (e.g.,  quantified  recall  goals  and  search  cost  constraints);  (2)  search 
strategy  formulation  including  database  selection  and  formulation  of  searches 
based  on  the  FPR;  (3)  execution  of  search  strategy;  (4)  evaluation  of  such 
effectiveness;  and  (5)  search  reformulation  and  rerunning  including  closing  the 
feedback  loop  back  to  stage  (l)  or  (2). 

An  explicit  aspect  of  the  general  model  is  the  Boolean-based  topic 
representation.  Implicitly,  therefore,  the  model  presumes  effective  searching 
can  be  achieved  in  the  Boolean  framework.  In  fact,  our  previous  work  (see, 
e.g.,  [0VER74]  and  [MARC83b]  has  indicated  that  certain  augmented  Boolean 
techniques  could,  indeed,  be  very  effective  in  searching,  especially  by 
inexperienced  users  in  the  context  of  heterogeneous  databases. 


2.2  Computerized  Search  Assistance  Enhancements 


Enhancement  of  the  CONIT  experimental  intermediary  search  assistant  has 
been  accomplished  with  two  purposes  in  mind;  (1)  increased  intelligence  and 
sophistication  of  search  assistance,  as  by  incorporating  features  of  the  models, 
and  (2)  increased  suitability  of  the  intermediary  system  for  us  and  others  to 
perform  retrieval  system  experiments. 

One  enhancement  that  serves  both  modeling  and  experimental  purposes  was  to 
incorporate  within  CONIT  a  facility  to  identify  and  record  all  computer-related 
cost  components,  both  from  the  remote  retrieval  systems  and  the  intermediary 
system.  (Fo  details  on  this  facility  see,  for  example,  theses  by  Feinstein 
[FEIN82]  and  Weber  [WEBE83].)  Costs  that  must  be  identified  include  those 
associated  with  connect  time  for  the  systems  and  network  connectors  and  with 
online  and  offline  print  charges.  This  facility  permits  not  only  retrospective 
review  of  charges  but  a  prospective  analysis  of  future  costs  for  planning 
purposes. 

Associated  with  the  cost  analysis  facility  is  a  new  accounting  facility 
(see  thesis  by  Lee  [LEE83])  which  permits  individual  and  group  accounts  on  CONIT 
and  the  several  remote  retrieval  systems.  Costs  for  the  different  accounts  are 
recorded  and  cumulated  dynamically  and  maximum  costs  can  be  set  preventing  users 
from  accumulating  costs  beyond  set  limits. 

A  new  search  cataloging  facility  is  nearing  completion  (see  Schwartz  thesis 
[SCHW84])  which  permits  search  statements  (and  search  results  when  executed) 
across  the  network  of  retrieval  systems  to  be  saved  in  individual  and  group 
"catalogs".  This  facility  will  permit  testing  and  evaluation  of  the  concept  of 
the  (possibly  long-term)  development  of  search  strategies  and  their  subsequent 
utilization  within  individual  or  group  usage  scenarios. 

Another  new  facility  has  provided  one  more  direct  example  of  the 
incorporation  of  intelligent  aids  in  intermediary  systems.  This  facility  (for 
details  see  thesis  by  Kutin  [KUTI84]  allows  the  intermediary  system  to  select  an 
appropriate  path  through  the  network  to  a  desired  database.  Rather  than  simply 
take  a  fixed  priority  list  of  paths,  this  path  selection  algorithm  keeps  a 
record  of  path  selection  attempts,  as  well  as  retrieval  system  schedules  and 
database  availability,  and  chooses  new  connector,  network,  and  system  links 
based  on  current  indications  of  success  or  failure. 


48 


The  main  thrust  of  our  experimental  development  has  been  to  determine  hew 
well  our  search  models  can  be  incorporated  into  the  intermediary  assistance 
system.  In  an  early  manifestation  of  this  effort  Yip  [YIP81 ]  implemented  an 
experimental  intermediary  system,  termed  EXPERT-0,  which  had  a  rudimentary  form 
of  the  five  stages  of  our  search  model  as  described  above.  EXPERT-0  was 
implemented  after  the  style  of  expert  systems  of  the  artificial  intelligence 
genre  and  facilitated  a  question-and-answer  dialog  by  which  the  intermediary 
system  assisted  the  user  in  preparing  a  Boolean  topic  representation  and  a 
search  strategy.  EXPERT-0  then  automatically  executed  the  search  strategy,  led 
the  user  to  review  the  catalog  records  of  documents  thus  found,  and  prompted  the 
user  to  revise  the  search  strategy  after  reviewing  this  feedback  —  particularly 
in  adding  or  deleting  individual  search  terms  and  whole  concept  factors  based  on 
relevance  considerations. 

As  we  reported  in  [MARC81b],  there  appeared  to  be  important  potential  for 
enhanced  assistance  in  aspects  of  EXPERT-0  but  a  major  deficiency  in  this 
preliminary  implementation  was  a  lack  of  integration  of  the  expert  modes  with 
the  "standard"  CONIT  modes.  In  particular,  we  concluded  that  for  a  truly 
effective  intermediary  assistant  one  needed  not  only  the  relatively  few,  albeit 
highly  automated,  modes  of  EXPERT-0  but  also  the  many  modes  and  functions 
provided  by  CONIT  —  including  the  ability  for  the  user  to  direct  or  initiate 
activity  (e.g.,  through  the  command  mode)  as  well  as  for  the  computer  system  to 
direct  or  control  operations.  Thus  we  have  striven  to  design,  implement  and 
evaluate  an  enhanced  CONIT  that  would  integrate  and  extend  the 
computer-directed,  expert-styled,  formalized  planning  and  evaluative, 
menu-oriented  features  of  EXPERT -0  with  the  more  extensive  user-directed, 
command-oriented,  informally-tutorial  features  of  standard  CONIT. 

We  have  made  progress  toward  achieving  this  mixed-initiative,  integrated 
intermediary  system.  The  basic  design  for  such  a  system  was  described  in 
[MARC83a].  Various  data  structures  devised  for  the  catalog  system  provide  a 
basis  for  integrating  the  standard  search  structures  with  the  new  problem 
representation  and  evaluation  structures.  A  meta  level  was  added  by  which  usrs 
could  in  command  mode  (l)  construct  topic  representations;  (2)  generate  search 
strategies  from  these  representations;  and  (3)  execute  the  search  strategy.  In 
addition,  we  are  in  the  midst  of  adding  a  meta-meta  level,  labeled  "ASSIST," 
which  assists  users  with  a  question-and-answer  menu-oriented  mode  in  performing 
appropriate  construction,  generation  and  execution  operations.  One  unique 
feature  of  ASSIST  is  the  explication  for  the  user  of  the  commands  that  are 
implied  by  his  menu  selection  actions  and  answers  to  questions  so  as  to  help  the 
user  understand  what,  in  fact,  is  being  done  for  him  and  ease  the  way  to 
user-directed  command  operations  if  and  when  he  chooses  to  take  such 
initiatives.  A  main  goal  of  our  research  is  to  complete  these  meta-meta  levels, 
especially  in  regard  to  the  search  evaluation  components,  and  analyze  their 
impact  on  the  modeling  and  retrieval  assistance  objectives. 


2.3  Experimental  Testing  and  Evaluation 


Taking  advantage  of  the  new  experimental  tools  described  above,  we  have 
begun  to  perform  new  and  wider-ranging  evaluation  of  the  models  and  techniques 
we  have  developed.  Our  current  experiments  break  new  ground  over  our  previous 
experiments  in  at  least  four  major  respects.  First,  of  course,  we  are  starting 
to  get  some  experience  with  the  new  functional  capabilities  of  CONIT.  Second 


49 


we  have  switched  from  the  strictly  controlled  environment  in  which  users 
operated  the  computer  under  our  direct  observation  from  terminals  in  our  own 
laboratory,  to  an  "open"  environment  in  which  users  engage  the  system  at  times 
and  places  entirely  of  their  own  choosing  —  generally  in  their  own  labs  cr 
offices.  There  is  some  loss  of  information  in  an  open  environment  but  this  is 
more  than  counterbalanced  by  the  greater  realism  achieved  and  potential  for  more 
extensive  user  participation  in  the  experiments. 

A  third  experimental  variation  enables  us  to  obtain  additional  information: 
a  record  of  the  computer  response  tine  and  the  user  response  (think)  time  for 
each  operation  —  previously,  we  could  not  distinguish  these  two.  A  fourth,  and 
highly  significant  variation,  is-  the  user  involvement  with  costs.  In  previous 
experimental  situation  the  project  bore  the  full  cost  of  all  computer  charges; 
in  these  experiments  the  full  costs  are  being  borne  by  his  organization  and  he 
is  made  aware  of  the  amount  he  has  spent  so  far  and  the  maximum  amounts 
expendable  for  any  one  session,  for  himself  in  all  sessions,  and  for  the 
organization  as  a  whole.  Along  with  the  open  environment  context,  then,  these 
experiments  have  a  much  more  realistic  setting  than  the  previous  ones. 

The  experiments  are  being  partially  sponsored  by  the  National  Library  of 
Medicine  and,  therefore,  we  have  emphasized  medical,  biomedical,  and 
health- related  topics.  The  two  main  organizations  participating  in  the 
experiments  so  far  are  the  Hudson  River  Foundation  —  an  environmental  research 
organization,  and  the  M.I.T.  Laboratory  for  Computer  Science  —  particularly  its 
medical  (clinical  decision-making)  group.  We  are  in  the  early  stages  of  these 
experiments;  as  of  this  date  there  have  been  approximately  six  users.  However, 
four  of  these  users  made  extensive  use  of  the  system  for  a  total  of  over  24 
sessions. 

We  have  not  yet  made  a  detailed  analysis  of  these  early  uses  of  these  new 
experiments.  However,  these  early  results  do  seem  to  substantiate  a  few  general 
observations.  First  of  all,  most  sessions  and  most  users  appear  to  have 
obtained  relevant  and  useful  document  references  fairly  quickly  without  any 
instruction  other  than  that  given  by  the  intermediary  system.  In  this  respect 
we  are  getting  evidence  to  support  some  of  the  previous  successful  experimental 
results  but  in  the  more  difficult  and  realistic  context  of  the  open, 
cost-sensitive  environment. 

The  cost  factor  does  appear  to  have  a  major  effect  on  user  behavior.  For 
example,  while  average  session  times  for  previous  experiments  ran  about  100 
minutes,  in  these  recent  experiments  session  times  were  on  the  order  of  20 
minutes. 

In  addition  to  our  own  experiments  we  have  begun  to  permit  experiments  and 
demonstrations  with  CONIT  by  fellow  researchers.  So  far  there  have  been  several 
demonstration  users  in  locations  around  the  country.  As  explained  below,  we 


intend  to  broaden  the  scope  of  this  activity  so  as  to  make  our  own  work  better 
known  and  more  beneficial  to  others  while  enhancing  the  opportunities  for 
scientific  interaction  among  researchers. 


2.4  Intermediary  Systems  in  Generalized  Information  Contexts 


We  have  begun  to  analyze  the  applicability  of  the  intermediary  system 
concept  for  networking  heterogeneous  information  systems  outside  the  specific 
area  of  bibliographic  retrieval  systems  which  has  been  our  focus  to  date.  Thus 
consideration  is  being  given  to  user  assistance  for,  and  the  integration  of, 
such  functions  as  computerized  messaging  and  conferencing;  test  preparation, 
editing,  and  composing;  and  data  (numeric)  as  well  as  bibliographic  storage  and 
retrieval.  We  mention  these  new  ventures  here  to  point  out  the  potential 
broader  applicability  of  intermediary  systems  beyond  the  text-based  document 
storage  and  retrieval  systems.  Certainly,  the  development  of  more  comprehensive 
theories  and  application  techniques  in  information  science  and  technology  will 
require  efforts  in  integrating  these  various  informational  activities. 


3*  Relationships  to  Other  Efforts 


As  we  have  mentioned  above,  our  early  work  in  intermediary  assistance 
systems  helped  spawn  a  major  new  activity  in  the  field  of  information  science 
and  technology.  In  (MARC83b)  we  described  in  some  detail  this  burgeoning  new 
activity.  Some  of  the  important  investigators  listed  include:  M.  Williams 
(WILL80),  Meadow  (MEAD82) ,  Goldstein  (G0LD78),  Fayen  (FAYE29),  Toliver  (T0LI81), 
Doszkocs  (DOSZ80),  Smith  (SMIT80),  Lefkovitz  (LEFK82)  and  P.  Williams  (WILL81 ) 
and  the  intermediary  creations  they  investigated  have  names  such  as  Searcher's 
Workbench,  IIDA  (Individualized  Instruction  for  Data  Access),  the  User  Cordial 
Interface.  OLSAM  (On-Line  Search  Assistance  Machine),  CSIN  (Chemical  Substances 
Information  Network),  NAM  (Network  Access  Machine),  CITE,  SCI-MATE,  and 
USER-LINK.  Since  the  writing  of  (MARC83b)  the  intermediary  activities  have 
continued  to  blossom  and  we  now  have  such  new  entries  as  Searchmaster  by  SDC, 
IN-SEARCH  by  Menlo  Corp.,  and  Search  Helper  by  Information  Access  Co. 

Most  of  the  intermediary  systems  alluded  to  above  have  some  unique  features 
(e.g.,  sophisticated  analysis  of  user  searching  by  IIDA,  certain  natural 
language  analysis  by  CITE,  and  search  assistance  for  specialized  areas  in  CSIN). 
Our  experimental  CONIT  system  may  be  distinguished  for  the  combination  of 
projecting  a  common-interface  (virtual-system)  approach  to  accessing  a  network 
of  heterogeneous,  broad-based  systems  and  databases  while  providing  extensive 


51 


•  jv’vv'-j'-Li’j  - j«>^'  •V'VvX.’Uv,  v.-'\  Ivl  V'l-.v 


assistance  in  generating  and  executing  searching  on  a  broad  range  of  topics. 
Equally  important  to  our  approach  is  the  attention  to  the  information  science 
aspects  of  intermediary  investigations.  Our  experimental  testing  and 
evaluations,  along  with  our  analyses  for  the  information  science  ramifications, 
have  been  quite  extensive  and  unique  in  their  mix  as  compare  with  other 
investigations.  Our  current  emphasis  on  formalized  models  of  the  indexing  and 
retrieval  processes  which  can  be  incorporated  into  experimental  systems  and 
tested  in  realistic  contexts  presents  another  point  of  departure  for  cur 
approach. 

Many  investigators  have,  of  course,  studied  various  aspects  of  modeling  of 
information  systems.  A  few  examples  in  the  area  of  search  models  include  Bates 
(BATE79),  Jahoda  (JAH074),  and  Markey  and  Atherton  (MARK78).  Our  own  efforts 
are  in  the  direction  of  extending  and  formalizing  these  other  works  and 
incorporating  these  models  into  a  concrete  form  for  experimentation  and 
analysis.  We  believe  our  efforts  in  this  respect  have  certain  unique  aspects  — 
e.g.,  the  dynamic  evaluation  of  searching  and  the  incorporation  of  all  model 
components  into  expert  assistance  systems. 


Acknowledgments 

The  research  described  in  this  paper  was  supported  by  the  National  Library 
of  Medicine  under  Grant  LM-03210,  by  the  National  Science  Foundation  Division  of 
Information  Science  and  Technology  under  grants  1ST  8201842  and  8414485,  and  the 
Hudson  River  Foundation  under  grant  4/84X-4.  The  author  also  acknowledges  the 
many  vital  contributions  of  his  project  co-workers;  in  particular,  see 
references  in  Section  2.2  of  this  paper. 


REFERENCES 


(BATE79)  Bates,  Marcia  J.  "Information  Search  Tactics".  Journal  of  the  American 
Society  for  Information  Science.  30(4 ) s 205—21 4 ;  July  1979. 

(DEAN80)  Deane,  Thomas  J.  "Automatic  Database  Selection  through 
Multidisciplinary  Searching  with  Classification  Mapping."  Master  of  Science 
Thesis  in  Electrical  Engineering  and  Computer  Science,  Massachusetts  Institute 
of  Technology.  August,  1980. 

(DOSZ80)  Doszkocs,  Taaas;  Rapp,  Barbara  A.;  and  Schoolman,  Harold  M. 

"Automated  Information  Retrieval  in  Science  and  Technology."  Science  208:25-30; 
April  4,  1980. 


i 


(MARK78)  Markey,  Karen;  Atherton,  Pauline.  ONTAP:  Online  Training  and  Practice 
Manual  for  ERIC  Data  Base  Searchers.  Syracuse  University  ERIC  Clearinghouse  -n 
Resources  Report,  June,  1978. 

(MCGI79)  McGill,  Michael  J.;  Huitfeldt,  Jenifer.  "Experimental  Techniques 
Information  Retrieval."  In:  Martha  E.  Williams  (ed.),  Annual  Review  of 
Information  Science  and  Technology.  14:93-127;  1979* 

(MCGI83)  McGill,  Michael  J.;  Brown,  Gerhard  E.;  and  Siegel,  Sidney.  "The 
Chemical  Substances  Information  Network:  Its  Evaluation  and  Evolution: 
Proceedings  of  the  1983  National  Online  Meeting.  Medford,  N.J.:  Learned 
Information,  Inc.  pp  367-376;  April  1983* 

(MEAD79)  Meadow,  Charles  T.  "The  Computer  as  a  Search  Intermediary".  Inline 
3(3): 54-49;  July,  1979- 

(MEAD82)  Meadow,  C.  T.;  Hewett,  T.  T.;  and  Aversa,  E.  S.  "  A  Computer 
Intermediary  for  Interactive  Database  Searching;  Part  I:  Design;  Part  II: 
Evaluation".  Journal  of  the  American  Society  for  Information  Science. 

33(5) :325-332  and  35(6) : 357-364.  September  and  November,  1982. 

(0VER74)  Overhage,  C.  P.  J.;  Reintjes,  J.  R.  "Project  Intrex:  A  General 
Review".  Information  Storage  and  Retrieval.  10(5):157-188;  1974. 

(PREE80)  Preece,  Scott  E.;  Williams,  Martha  E.  "Software  for  the  Searchers 
Workbench."  Proceedings  of  the  43rd  ASIS  Annual  Meeting.  17:403-405.  October, 
1980. 

(R0SE75)  Rosenthal,  Robert.  "Accessing  Online  Network  Resources  with  a  Network 
Access  Machine."  IEEE  Intercon  Conference  Record.  Session  25/3;  April  1975* 


(SARA71 )  Saracevic,  Tefko.  "Selected  Results  from  an  Inquiry  into  Testing  of 
Information  Retrieval  Systems."  Journal  of  the  American  Society  for  Information 
Science.  22(2) : 1 26-1 39;  1971. 


INTRODUCTION 


o  GOAL  OF  NATURAL  LANGUAGE  INTERFACES 

to  allow  the  user  to  state  complex  questions  or 
commands  in  his  own  native  mode  with  little  training 

o  THREE  PROBLEMS  WITH  CONVENTIONAL  NATURAL  LANGUAGE  INTERFACES 

I  they  are  not  "easy  to  use* 

o  "to  put  constraints  on  what  English  is  acceptable 
and  what  is  not  violates  the  spirit  of  the  task" 

— De  Jong 

II  they  are  expensive  to  build  and  maintain 

III  their  capabilities  are  limited 


o  SOLUTION  APPROACH:  The  user  meets  the  system  half  way 
by  MATCHING  the  user's  and  the  system's  capabilities. 

This  "match”  is  realized  by  a  "Menu-based  Natural  Language 
Understanding"  approach,  as  implemented  in  the  NLMENU  system. 


HABITABLE  SUBLANGUAGE 


"one  in  which  users  can  express  themselves  without  straying 
over  the  boundaries  into  unallowed  sentences"  — [Watt,  1968] 


EVALUATION  RESULTS  FOR  CONVENTIONAL  NLls 

o  Coverage  Mismatch  Problems 

o  semantic  overshoot  =  user's  queries  overreach  the 

capabilities  of  the  system 

o  semantic  undershoot  a  users  fail  to  make  use  of  many  of 

the  capabilities  of  the  system 


EX  What  types  of  aircraft  ar  there? 
What  planes  do  you  know  about? 


< —  accepted  by  PLANES 
< —  rejected  by  PLANES 


o  Other  Results 

o  users  still  require  training  in  the  use  of  a  natural  language 
interface.  Experiments  show  that  users  can  use  QBE  (a  dbms 
query  language)  with  a  comparable  amount  of  training 

o  user’s  queries  tend  to  be  short  and  simple,  about  7  words  ♦/- 
yet  1/3  of  their  queries  fait 


o  users  are  often  nable  to  solve  their  problems  using  NL 


iBHairaf 


waww 


(ipscific 
< (pacific  (Mrts> 

<  specific  Mpnmtt> 
<•  w- 
<4  ntw  p«rt> 

<4  «w  fipmint  > 


fMtr  thm 
las*  t*4i» 

r*«w  ti**«i  ar  i^wal  is 
«mi  !*•«  me  squsi  is 
»9»it<  is 


(Spso'lc  P4fl  CllVt) 
<spsc*fkc  c  atari  > 
<*pse*rte  psrt  ntmti) 

<«pSCfffc  p«rt  P4TT«I> 
(fpsciMe  MDpHsr  cily •> 
Opacifk  wjspisr  *an»ss> 
Opscific  fuppisr# 

tjpsei'tc  iNpiswt  P4r**i> 
<spsciffc  iNpwsw  uppisr 
(ipscifk  mnsr> 


■rrarwrm 


wneu  0* n.  on  • 


mm  pvt  ««  It 
Mm  part  part#  ta 
MMI  pvt  wM^lt  U 
■Nth  m  part*  at  Mpnanu 
aMiMX  ar*  aut>p»a*  »r 


irwTT  1  ■J-i-i  rrsr* 


Re-start 

Refresh 


md  parts 


Rubout 


Save  Q 


Show  query 
Retrieve  Q 


»ini  of  tmeiiom 


li=cEnni 


Delete  Q 


Exit  system 
Play  Q 


QUERy : : : >*• rd  parts 
»ei«TIQN  PWT—  (csrp.Asi  -ty  5) 


j  P°9r 

•  1  fl«t( 

1  COL 3# 

'ue:  suns;  tv  • 

:  P*l 

;  awi 

1  "ed 

1  1 2 1 LorCor  1 

|P»2 

1  BO' t 

; srear 

1  |  ? !  ®*r  s  1 

:»»3 

1 ser** 

!  b«  ua 

1  1*1  #0**  1 

|  0*4 

1  screw 

!  red 

1  l«ILo"do"  1 

,P«5 

i  C4* 

'plus 

i  l2|P*r.»  1 

t.»  Ci spiay  t ■  3.9333  sacords,  E«»cut-on  eo«o*eted  «r  2.95  seconds 


»ss  pa  '•rcord 


.MENU : 

[menace-  Date’ 3  Su 

PP 

Iler-Part3 

nnavjcm 


quantity 

city 


r>*m* 

part# 

mp9««r# 

tt«tua 


(•pacific  mpp iart>  | 
(•pacific  IMHI)  | 
(•pacific  ifapmanr«>! 
<«  naw  wjpp#ar> 

<a  naw  part>  J 
<4  naw  Oapmanf  >  ' 


p*a«tar  man 
ati  than 

^aator  »**n  or  oquai  va 
laaa  than  or  apuai  ta 
aquai  to 


(•pacific  coiart> 


w«ai»  part  na/na  a 
■■*iaao  part  part#  a 
whaaa  ruppiar  city  P 


ao  trapmant  part#  m 
»  iNpmant  •uppaerr  a 
wa  iiafiir  ftarua  ia 
waa  part  waipit  a 
i  Wapmont  quantity  la 
cfi  ara  Mpnaiti  of 


•r«  parra  of  aqmaott 


c=i«-anas 

Re-3tart 

Refresh 

Rubout 
Save  Q 

Retrieve  Q 

Edit  Item 
Delete  Q 

Exit  svstem 
Play  Q 

F  imj  girts  whose 

color  is 

auERr 

:  >r 

parts 

or  fssio* 

»ei«* 

13*  appr. 

-(  C«r4!  "01 

tv  5 ) 

: PQPTiiiPre 

i  color 

1  WlCIGMT  1  Cl  TV  I 

;  ®ai 

1  "Vt 

1  -*a 

!  U’wO^dOr 

l»«2 

!  SO  * 

1 9ree« 

:  :)!Pir  l  1 

IOS-J 

;  screw 

IJ'uf 

!  U|Ro»e 

,PBS 

1  screw 

1  red 

1  141  London  1 

1  o«? 

!  ca- 

1  Oi  we 

!  l2|P«r,s  1 

*  .  •«  r. 

9s:  2. so 

a 

i 

•» 

> 

• 

3.0333  seesnds,  Erecut'O*  completed  1  *  2.95  seconds 

CONVENTIONAL  NATURAL  LANGUAGE  MENU-BASED  NATURAL  LANGUAGE 


10  -  5C%  failure  rate 

0%  failure  rate 

typing  required 

selection  through  pointing 
speech,  typing 

possible  spelling  errors 

no  spelling  errors 

"creating"  a  sentence  is  hard 

"recognizing"  a  sentence  is 
easier 

no  support  for  the  user  during 
query  caiposition 

supports  the  user  during 
query  ccrpositicn 

intimidates  users 

encourages  exploration 

users  are  assumed  to  be  familiar 
enough  with  an  application  to 
ask  questions 

users  can  use  NLMENU  to 
explore  a  new  application 

user  training  is  required  to 
learn  the  limits  of  the  system 

much  less  training  is  required 

specific  'help'  on  using  the 
system  is  often  not  available 

'help'  on  system  features 
is  only  a  mouse  button  away 

mysterious  about  coverage 
-limitations  are  inplicit 
-burden  is  on  the  user  to  infer 
the  coverage  of  the  system 

obvious  about  coverage 
-makes  limits  explicit 

Figure  1-13:  Advantages  of  Menu-Based  Natural  Language 
for  the  End  User 

INPUTS: 

grammar 

lexicon 

root 

interface  name 
window  description 
active  commands 


NLMENU  DRIVER 


LOAD  grammar,  lexicon,  windows,  commands, 
commands,  experts,  target  software  system 
(AS  NEEDED,  COMPILING  WHERE  NECESSARY) 


USER  CHOOSES  itea  PROM  pane 


SELECT  <lex,  category,  window-pane,  translation> 
WHERE  pane  •  window-pane  6  itea  ■  lex 


COMMAND? ( w i ndow-pane ) : 


EXECUTE-COMMAND (lex)  | 

{restart 

:rubout 

*  sshow  parse 

{show  translation 
texecute 
l  EXIT 

i  itea  help 
Msenu  help 
{edit  itea 
ledit  itea  help 
{edit  aenu  help 
:edit  expert 
tsave  query 
:load  query 
{delete  query 
{cursor  movement 


—  EXPERTPi translation) — 


EXSCUTE( translation) 
RETURNING  <lea.  translation*! 


PARSE(les) 


FIND  COMPLETIONS 


CALCULATE  INACTIVE  6 

ACTIVE  PANES  fc 

REFRESH  DISPLAY  AS  NEEDED 


MAIN  NLMENU  INPUT  LOOP 


PROBLEM  II:  BUILDING  INTERFACES  CHEAPLY  AND  QUICKLY 


o  THE  GOAL:  TRANSPORTABLE  NLIs 

If  you  can’t  make  natural  language  interfaces  cheaply,  then 
they  won’t  be  widely  used.  Cost  is  a  function  of  portability. 

Extremes  in  portability  are:  a  system  that  requires  complete 
reprogramming  before  porting  -and-  a  system  that  requires  no 
extra  effort. 


o  STATE  OF  THE  ART:  AI  Corporation’s  INTELLECT 
o  costs  around  S60K  for  the  system 

o  requires  two  man-months  for  a  trained  interface  designer 
to  build  a  single  interface 
o  interfaces  must  be  empiracally  tuned 

o  resulting  interfaces  have  all  the  ease-of-use  problems 

SO — NL  interfaces  will  only  be  built  for  important  applications 
and  users  will  still  need  training 


o  TWO  NLMENU  SOLUTIONS 

o  GRAMMAR  WRITER’S  TOOLKIT— avai table  on  PCs  and  Explorers 
o  INTERFACE  GENERATOR — available  on  Explorers 


VARIETIES  OF  PORTABILITY 


o  machine  and  programming  language  i ndependence.  Lisp,  C, 

o  source  NL  independence:  English,  ... 

o  target  system  independence:  RTMS,  SQL,  PROLOG,  ... 

db  updates,  graphs,  info  retrieval,  ... 

o  application  independence:  University  5/23  (frel/fattr) 

Supplier-Parts  3/12 

City  Planner  2/32 

Military  Demo  4/28 

System  Relations  5/33 

Austin  Restaurants  1/10 

Baseball  Statistics  3/48 

Ladder  Blue  File  14/72 

o  schema  independence:  EMP-DEPT-SAL  DEPT-MGR  vs 

EMP-MGR-SAL  MGR-DEPT 

use  views,  defined  fields,  aliases,  and 
value  coercion  but  then  display  the  schema 


System  Commands: 

T  utor I al 

Build  Interfaces 
Guided  SQL  —  Oracle  3.8 
Execute  Saued  Queries 
Report  Writer 

EXIT  NLNENU  SYSTEM 


User -owned  Interfaces: 


Congressnen  Toy 

Deno  THOMPSON 

(R-TI-2) 

BI-BB-B3 

14:40:05 

♦  Courses 

THOMPSON 

(R-TI-2) 

1 2-2B-B2 

15:22:19 

Courses 

THOMPSON 

(R-TI-I) 

1 2-20-82 

13:29:20 

Courses 

THOMPSON 

(R-SQL) 

1 2-20-02 

14:22:34 

EG  deno 

TH0MP60N 

(R-EG) 

1 2-20-B2 

14:00:00 

OST  Packages 

THOMPSON 

(R-TI-2) 

12-20-82 

14:00:00 

Suppl ler-Parts 

THOMPSON 

(R-TI-2) 

12-16-82 

10:18:45 

Suppl ler-Parts 

THOMPSON 

(R-TI-I) 

12-16-82 

10:55:20 

Suppl ler-Parts 

THOMPSON 

(R-60L) 

12-16-B2 

10:56:30 

TI  OBITS  Suruey 

THOMPSON 

(R-TI-2) 

12-2Q-82 

14:00:00 

Upconlng  Conferences  THOMPSON 

(R-TI-2) 

81-14-83 

19:22:56 

Blue  File 

TH0NP60N 

(R-TI-2) 

03-14-83 

09:51 :36 

TOR 

THOMPSON 

(R-TI-2) 

83-83-83 

12:36:16 

♦  TOR 

THOMPSON 

(R-SQL) 

B3-Q3-B3 

12:36:16 

Interfaces  Granted 

to  the  user  t 

Suppl ler-Parts 

SREMZ 

(M) 

12-16-82 

09:4S:32 

Public  Interfaces: 

Jobshop  deno 

DRVIS 

(fl-TI-1) 

12-25-82 

16:27:32 

Jobshop  deno 

ORVIS 

(R-TI-2) 

12-25-82 

17:10:20 

Jobshop  deno 

DRVIS 

(R-SQL) 

12-28-82 

14:00:00 

Baseba 1 1  deno 

ROSS 

(R-TI-I) 

12-18-82 

12:40:23 

Basebal 1  deno 

ROSS 

(R-TI-2) 

12-25-82 

13:37:01 

Baseba 1 1  deno 

ROSS 

(R-SQL) 

12-18-82 

12:23:24 

*  ■  Loaded  Interface 

M  «  Manually  Generated,  fl  «  flutomat ically  Generated 
TI  a  Lisp  Machine  translations,  SQL  ■  SQL  translations 


TEXRS  INSTRUMENTS,  INC 


htatrvaiara 


(•pacific  wctka  Mpirtim 
(^•dlle  Mctlanrai 
(•pacific  itari-fiwi) 


tuc 


JklM 

credit 


tlttp 

MCI 

itirt 


Ina true ter 


earner ■ »nr« 


«(• 

Inc* 


(•pacific  eanaa) 

(•pacific  «a«Hana> 
(•pacific  m#tf\*tar»> 
(•pacific  btareetei 


greater  owe 

Me  man 

greater  man  ar  aped  ta 
late  man  ar  epuel  ca 
apw  la 


(•pacific  e 
(•pacific  raeme) 
(•pacific  inamactarai 
(•pacific  inetructer  namaa 
(•pacific  apauaaa> 
(•pacific  na  true  tar  ratCta 
< •pacific  eunpua  aaareaaa 
(•pacific  aiianaPnai 
(•pacific  fancty> 
(•pacific  kitcracto 


Re-start 

Rubout 

Show  query 
Retrieve  Q 

ItsITCTTnB 

Exit  system 
Play  Q 

Refresh 

Save  Q 

Delete  Q 

Find  sect  Iona 

ehich  arc  offerings  of  tha  coursa 

(CS  2682) 

'E»*Outtng  .  .  . 

[OUEB'f : : :  tF. nd  sect  i  one  eh. eh  arc  ofrer.nge  of  tha  source  !CS  2692) 


•CLATX 

ora  SECTION- 

— ( eardi npl .iv  l«) 

1 3E°0R 

T  -E*»T  |  COURSE  « 1  SECT  I  ON*  1  ST  BBT  -HQjfl  1  E"0-iOUB 

ID9YS 

,B00" 

IlNSTBUCTOB 

1 

ICS 

12692 

113566 

16.3 

12. «3 

ITB 

ICI 19 

IBODtW* 

1 

ICS 

12692 

1 62996 

19.  S3 

19.43 

1  neF 

1  91  94 

1 EN20B 

1 

ICS 

12692 

162923 

19.39 

19.43 

imF 

1 MSS62 

IDE9N 

1 

ICS 

1  2692 

162926 

1 19.9 

119.  S 

1  "UF 

IPEB68 

IDE9W 

1 

ICS 

12692 

162943 

119.9 

119.3 

1  nwP 

•-SS62 

IBBUC9 

1 

i  CS 

12692 

162959 

111 . 95 

II  1  .S3 

1  *uf 

i  S299 

IB«USR 

1 

ICS 

12602 

162132 

112.1 

1 1  . 0 

,  ruP 

1  93 1  2 

1 hgftCEB 

1 

'Ca 

1  92 

16211  2 

12.2 

13.1 

.  -UF 

i "SS62 

; nerceb 

1 

,  iff*. on  record 


NL MENU  Interface-  Citv  Planner 


(Mete  Change 


U2KjJ- 

(apecifw 

pircM  <  (pacific  neiyaork 

(a  nf*  pared)  (•pacific  MlfW rt 

<«  new  iw;WnrtnH)  ((pacific  nalgnasrh 

(•pacific  i«y»a» 
<  (pacific  pared  « 
(•pacific  parcel  a 
(•pacific  pared  (a 
(•pacific  ewna 
(•pacific  pared  pd 
(•pacific  pared  aul 

area  Si  «g  rt  (pacific  addres 

yiH  naar  araa  I  greater  man  ((pacific  tana 

pew*  fleer  area  **»  •**"  («paeifle  pared 

let  araa  d  tg  ft  «f»*«»  man  »r  apuai  ta  ((pacific  pared  pa 
•vamptlana  *"  «*•"  »  «P*d  la 

•*U4»  f 

not  f 


tnawni 


•aseeaed  value  k 
•  af  nortec 
•  at  »  Mng  isdt 
0  at  parking  (pec 


araa  at  «g  ft 


sarnie . aan 


wkaee  pared  war*  la 
wnaee  pared  dock  la 
■can  pared  aaacrlptdn  a 

«in  pared  planning  area  a 
Idea  pared  auapaaivOna  area 


graaiar  man 
an  man 


wheat  pared  tana  a 
whose  at  a 
ikaaa  pared/  a 
kaaa  pared  aeeaiaitO  a 
■aa  pared  anp  me  cede  I 
i  pared  nata  property  c 


lvl»«  (tmiH' 

Re-start 
Refresh 


Rubout 
Save  Q 


Show  query 
Retrieve  Q 


Execute 
Delete  Q 


Exit  system 
Play  Q 


md  parcel!  and  owitri  of  parcels  whose  parcel  tana  la  S3 


i aico  of  itoito* 

D .  »o  i  ev • ng  .  .  . 

OuE#t : : : >Fi nd  parcel t  and  owners  of  parcel*  vhoie  parcel  tone  'i  >3 


fiunper  of  oar  see:  1 


3e>ect  PHBCcL*.  3UfiE»S  fron  P99CEI  where  ZONE  m  ('93'!; 


WH  O"  r»eord 


S 

-  -  > 

Find  (PART-ATTRS  of)  parts  (PART-MODS) 

PART-ATTRS 

—  > 

PART-ATTR  (and  PART-ATTRS) 

PART-ATTR 

- -> 

{color  weight  part#  ...1 

PART-MODS 

-  -  > 

PART-MOD  (and  PART-MODS) 

PART-MOD 

-  -  > 

whose  color  is  COLOR-VALUE 

PART -MOD 

—  > 

whose  weight  is  WEIGHT-VALUE 

COLOR-VALUE 

—  > 

{ red  blue  . . . } 

EX  Find  part#  and  weight  of  parts  whose  color  is  red 


A  SEMANTIC  GRAMMAR 


S  — >  Find  ( <ENTITY>-ATTRS  of)  <entity>s  (ENTITY-MODS) 
<ENTITY>-ATTRS  -->  <ENTITY>- ATTR  (and  <ENTITY>-ATTRS) 

< ENT I T Y > -MODS  -->  <ENTITY>-MOD  (and  < ENT I T Y > -MODS ) 

<ENTITY>-MOD  — >  whose  <ATTR>  is  <ent ity>-<ATTR>-VALUE 
Where  (ENTITY  ATTR)  in  RETRIEVAL  TABLE  ATTRS 


RETRI EVAL-TABLE_ATTRS  *  ((part  part#) (part  color) (part  weight)  ...) 
part-color-value  *  {red  blue  ...} 

part-weight  =  integer 


EX  Find  part#  and  weight  of  parts  whose  color  is  red 


AN  ATTRIBUTED  GRAMMAR 


recognition  paradigm 
must  handle  ill-formed  input 
open-ended  methodology 

1-30  man-months  per  application 

expensive  to  build  and  maintain 
applications 

requires  specially  trained  users 
to  build  usable  interfaces  to 
new  applications 

large  grammars  and  lexicons 

requires  large  memories  and 
a  larger  processor  burden 


generative  paradigm 

coupe te nee  *  performance 

closed,  controlled  methodology 
control  advantages  of  the  NL 
generation  paradigm 

1-30  man-hours  per  application 
(see  Chapter  3) 

much  cheaper  to  build  and 
maintain  applications 

end  users  can  build  their  own 
simple,  usable  interfaces  to  an 
important  class  of  applications 
with  less  training 

small  grammars  and  lexicons 

requires  small  memories  and 
runs  comfortably  on  a  PC 


Figure  1-14:  Advantages  of  Menu-Based  Natural  Language 
for  the  Interface  Designer 


PROBLEM  III: 

o  Grammar  Writer’s  Toolbench 
o  Guided  SQL,  Dow  Jones,  PC  Focus 
o  Complex  Interfaces 
o  "Define"  and  Anaphora 
o  Information  Retrieval  Queries 
o  Graph- valued  Queries 
o  Spatial  DBMS  Queries 

o  The  "Value  Recognition"  Problem  ft  Library  of  Experts 
o  Multiple  Target  DBMSs 

Dynamic  Attributed  Grammar  Lookahead  Parser 


o 


GRAMMAR  WRITER'S  TOOLKIT 

well-formedness  of  the  grammar  and  lexicon 

o  syntax  errors  in  grammar/ lexicor./spec 

o  find  dangling  references 

grammar  tracing  tools:  batch  tools 

sentence  generator 
mouse-sensitive  parse  tree 

screen  configuration  tools 

edit  items,  experts,  help  interactively 

limited  interactive  define  capability 

data  collection  tools  (in  progress) 

interface  development  environment  (on  the  PC) 

interface  generator  for  special  domains 


A  TOY  INFORMATION  RETRIEVAL  GRAMMAR  FOR  NLMEN'J 


S 

REFERENCES-NP 

REFERENCE-MODS 

REFERENCE-MOD 


SEARCH-TERMS 


-->  Find  REFERENCES-NP  (  Price  them  )  {  Order  them 

-->  {  references  <ref erence_subtypes>  }  (  REFERENCE -M : 0 S 

-->  REFERENCE-MOD  (  and  REFERENCE-MODS  > 

-->  whose_authcrs_inc lude  <authors  expert* 

-->  whose_t ime_of _publ i sh i ng_was 

{[between  <date>  and  <date>]  [before  <date>]  [after  ^d 
-->  whose_topic_invoives  SEARCH-TERMS 

-->  <search_term_typein>  ({and  or  but  not}  SEARCH- TERMS  ■ 


EX  Find  journal  articles  and  papers  in  conferences 
whose  authors  include  THOMPSON,  C*  and 
whose  topic  includes  "menu*based"  but  not  "update 


THE  MAIN  ADVANTAGE  here  is  that  the  user  does  not  need  to  learn  the 
syntax  of  the  target  query  languages  or  their  semantic  capabilities. 

THE  MAIN  DISADVANTAGE  is  that  a  grammar  approach  by  itself  may  be 
too  constraining.  Some  "direct  manipulation"  approach  to  narrowing 
the  query  set  may  be  better. 


|AF  OAF1  OF  T HE  tdllEt  AM  »•  CMP  I’M  IV  FAME  FOF  IAT1EPI  V.FOIE  iMpiVM  AI  lATT’FO  AVEFACE  t|  C.FCATEF  IHA*  (  Itf 


Figure  9:  A  9ar  Chart  of  the  doubles  and  homeruns  by  name  for  batters 
whose  individual  batting  average  is  greater  than  0.265 


[r.r.aph  [.;ryj.:r,i  \ 


SAMPLE  SPATIAL  QUERIES 


EX 


Find  homes  which 
wh  i  ch 
wh  i  ch 
wh  i  ch 
wh  i  ch 

and  draw  them 


are  located  in  Plano  east  of  Central  and 
range  in  price  from  70K  to  100K  and 
are  "for  sale  by  owner"  and 
have  4  bedrooms  and 

are  located  within  1/2  mile  of  an  elementary  school 


EX 


Find  lakes  whose 
whose 
wh  i  ch 
which 

and  draw  them 


size  is  greater  than  5  acres  and 

boundary  is  more  than  50X  owned  and 

are  located  less  than  2  miles  from  130  and  135  and 

are  located  less  than  30  miles  from  Dallas 


EX  Draw  a  posted  map  on  paper 

using  photodot  and  using  scale  1"  =  2000* 
that  shows  tight  hole  wells 

whose  location  is  between  30  and  31  latitude  and 

80  and  81  longi tude, 
that  were  drilled  by  Texaco 

that  were  drilled  between  May  1,  1970  and  May  1,  1980, 

that  show  oil  deeper  than  2000’, 

that  have  well  depth  deeper  than  5000’, 

that  are  now  operated  by  Shell, 

that  are  wildcat  wells, 

that  have  a  drilling  problem, 

that  have  mechanical  logs  and 

that  have  oil  analysis  available  ;an  Explorer- I i ke  query 


NiAcnu  0190 lay  uinoow  Austin  Rsttaurant* 


zm-of-cmpm 


N 


SYSTEM  BLOCK  DIAGRAM 


'Oi*p!«y  fcighx«y«  p«SS>*9  IKrou*^  0«ll»v* 

si 


NLMenu 


l 

(JOIN*  TROn  (C1TV  HICHURVi 

UHERE  (RND  (EQUAL  CITV.NRnE  "ORLLRS" > 

(INTERSECT. P  C1TV. BOUNORRV  MI GHWRV .BOUNORRV 

TUPLES  T 

PROJECT  (Cl TV .BOUNORRV  H] GHURV . BOUNORRV ) ) 

si 


RTMS  Database  with  GWIN  graphic  objects 

Ciiy  Table  H i ghway  Table 


I  inm.Ihhis _ I ’icl  i  If  ns _ l>msniitations  Su)»|iir.  turns  WnmIownni 


nmif 


THE  "VALUE  RECOGNITION"  PROBLEM 


o  THE  PROBLEM  involves  managing  that  part  of  the  lexicon 
concerned  with  database  values  in  queries  or  commands 

o  TWO  SUBPROBLEMS 

A.  recognizing  the  types  of  values 

B.  associating  NL  values  with  the  corresponding  DB  values 
o  TRADITIONAL  APPROACHES 

1.  put  values  in  the  lexicon 

2.  use  the  database  as  an  extension  of  the  lexicon 

3.  avoid  representing  values  by  using  value  patterns 

4.  avoid  representing  values  by  using  surrounding  context 

o  NLMENU  SOLUTION:  based  on  "interaction  experts" 

solve  problem  A  by  letting  user  choose  types 
solve  problem  B  by  supporting  value  specification 


o  BONUS:  a  library  of  experts 


HOW  EXPERTS  SOLVE  THE  "VALUE  RECOGNITION"  PROBLEM 


THE  PROBLEM  OF  LEXICAL  AMBIGUITY  OF  VALUE  TYPES  IS  AVOIDED 

at  query  composition,  precise  category  terms  are  included 
in  the  user's  menus  so  there  is  no  need  to  guess  at  a  most 
plausible  category  or  engage  in  af ter-the-query ,  menu-based 
clarification  dialogue 

o  mirrors  detailed  operational  distinctions 

EX  ships  -  whose  weight  is  -  <gross  weight> 

-  <dead  weight> 

o  does  not  require  the  detailed  appl icat ion-spec i f ic 
knowledge  that  Schwartz  suggested 

EX  Show  me  -  oil  wells  -  whose  drill  date  is  -  <drill  date> 

-  whose  well  depth  is  -  <well  depth> 

-  whose  map  scale  is  -  <map  scale> 

-  whose  location  is  -  <location> 


.  THE  PROBLEM  OF  SPECIFYING  VALID  FIELD  VALUES  IS  MADE  TRACTABLE 

the  selected  interaction  expert  then  pops  up  specialized 
displays,  tailored  to  the  type  of  a  data  item,  allowing 
fine  tuning  of  methods  for  specifying  values. 

Experts  can  support  the  user  using  the  same  techniques  that 
data  entry  interfaces  employ. 

o  Experts  can  have  HELP  messages  associated  with  them 
documenting  a  semantic  domain  or  attribute-role. 

o  Experts  can  validate  data  items  including  range  checking 
and  format  checking. 

o  Experts  can  handle  errors  associated  with  specifying 
invalid  values. 

o  Experts  can  convert  user  specified  values  from  an  external 
form  to  an  internal  encoded  form. 


o 


Naturally,  experts  can  not  guarantee  that  the  user  specifies 
the  value  he  meant  to,  only  that  the  value  he  specified  is  a 
valid  domain  value. 


86 


v-  v-*-'  w  A"  A"  %"  V  *-w  I  ■  -r 


A  "LIBRARY  OF  EXPERTS"  YIELDS  PORTABILITY 


The  NLMENU  system  defaults  to  simple  experts  and  is  custom izac.e 
as  more  complex  experts  are  needed,  so  users  pay  a  small  price 
f q f  br i ng i ng  uo  an  application  and  a  larger  one  on iy  *■  o .  —  s t 
important  applications. 

When  special  purpose  experts  can  be  selected  from  the  library 
of  experts,  the  cost  of  building  new  experts  is  reduced. 

When  special  purpose  experts  must  be  added,  they  can  be  adde 
to  the  library  to  reduce  the  cost  of  building  future  applica 


TAXONOMY  OF  EXPERTS 

typein  experts 
calculator  menu  experts 
units  experts 
range  experts 

single/multiple  and/or  experts 
tree  menu  experts 
project  expert 
domain  expert 
coded  field  experts 
compound  field  expert 
icon  experts 
form  expert 
composite  expert 


XI 


INTERFACING  NLMENU  TO  MULTIPLE  DBMS' 


GENERAL  SESSICNER  MENU  S.  SYSTEM  RELATIONS  &  INTERFACE  GENERA?  1  ? 
-->  create,  modify,  keep  track  of,  and  destroy  interfaces 

a.  core  grammar/ lex  icon  with  SQL  translations 

b.  core  grammar/lexicon  with  RTMS  translations 

c.  core  grammar/ lex  icon  with  PROLOG  translations 

d.  ...  <<  it  takes  1+  days  to  add  a  new  target  D3MS 

<<  about  30  lexical  translations  must  be  rewritten 


COMBINING  INTERFACES  TO  THE  SAME  TARGET  DBMS 
-->  combine  PORTABLE  SPECS 
or 

combine  generated  grammars  and  lexicons 


COMBINE  INTERFACES  TO  DIFFERENT  TARGET  DBMS' 

-->  easy  case:  no  spanning  queries  --  combine  grammars 

harder  case:  spanning  queries  --  join  phrase  has  a  semantics  tha 

requires  copying  from  dbmsl  to 
dbms2.  Not  implemented  yet. 


rV'»  V  « ‘J  '*  L"»  'jr* 


HI  ■?  "_M  "J* ’■'jr'-jr"  jww^jr 


FUTURE  DIRECTIONS 


o  IMPROVED  COVERAGE 

o  better  grammar,  database  and  system  coverage  is  possible 

EX  cover  INTELLECT 
EX  French  NLMenu 
EX  cover  SQL 

EX  cover  linear  programming,  CAD  applications,  software  dev 
EX  NLMenu  interfaces  to  expert  systems 
EX  NLMenu  database  update  grammars 

EX  NLMenu  interfaces  to  multiple  remote  heterogeneous 
databases  like  FOCUS,  SQL,  and  Dialog 
EX  NLMenu  as  the  hub  of  the  information  center 

o  CONTEXT  SENSITIVITY  ft  STRUCTURE  EDITING 

o  when  to  apply  semantics 

/  o  SIMULTANEOUS  EXECUTION  ft  CACHING  SUBQUERIES 

o  query  reformulation  is  an  iterative  process 

✓  o  COOPERATIVE  RESPONSE  MEETS  QUERY  OPTIMIZATION 

o  SIMULATION- VALUED  QUERIES  ft  REAL-TIME  DATABASE  APPLICATIONS 

o  NATURAL  LANGUAGE  INSPECTOR  ft  ACTIVE  MULTI-MEDIA  MAIL 

o  INFORMATION  PRESENTATION 

o  mixing  NL  and  displays  of  forms,  trees,  etc 

o  SUBLANGUAGE  MODULARITY 

o  combining  interfaces,  turning  grammar  modules  on/off 

o  LIMITATIONS  ft  HUMAN  FACTORS  TESTING 

o  can  end-users  translate  their  queries  to  NLMenu 
o  can  end-users  build  their  own  interfaces 
o  are  NLMenu  interfaces  effective  and  useful? 


!t 


I 

A' 


L. 


K 

/  • 

v\- 

r 


91 


iw 


SUMMARY  OF  PLENARY  WORKSHOP  ON  EXPERT  KNOWLEDGE  SYSTEMS 

Linda  C.  Smith 
University  of  Illinois 


92 


M  sF 

w  §J 

1 1  - 


|^|u 


111  12; 


M>CRnror' 


Plenary  Workshops  on  Expert  Knowledge  Systems 


The  discussions  on  expert  knowledge  systems  were  wide-ranging,  not  limited  to 
the  questions  on  this  topic  posed  by  the  conference  organizers.  Although 
difficult  to  summarize,  major  points  in  the  discussion  can  be  presented  in  a 
question-and-answer  format,  reflecting  the  questions  which  were  in  fact 
discussed. 

t .  What  is  an  expert  system? 

The  discussion  demonstrated  that  there  was  not  a  shared  view  of  what  was 
meant  by  the  term  "expert  system."  Craig  Thompson  observed  that  the  term  is  not 
used  consistently  in  the  AI  community — it  can  denote  either  the  rule-based 
system  providing  expert  assistance  in  a  narrow  domain  or  any  component  of  a 
system  which  embodies  knowledge  and  performs  intelligently  on  a  particular 
subtask.  It  is  likely  that  the  latter  sense  rather  than  the  former  is  what  may 
apply  in  enhancing  information  retrieval  systems.  In  trying  to  map  out  possible 
relationships  between  AI  and  IR,  we  should  not  prematurely  focus  only  on  expert 
systems,  but  consider  the  broader  range  of  AI  tools  and  techniques  which  are 
becoming  available.  To  date,  AI  researchers  have  focused  on  question-answering 
systems,  but  those  of  us  in  bibliographic  information  retrieval  think  that  it  is 
an  interesting  and  challenging  domain  worthy  of  attention  as  well. 

2.  What  are  the  contents  of  the  knowledge  base? 

In  the  narrow  definition  of  expert  system,  the  focus  is  on  encoding 
expertise  which  resides  in  the  minds  of  experts.  3ut  in  information  retrieval, 
the  knowledge  base  to  which  we  want  to  provide  access  is  multi-faceted: 
bibliographic,  numeric,  factual,  full  text,  graphics  (even  notes  on  napkins). 

3«  What  is  an  "expert"  searcher?  What  do  we  know  about  searcher  expertise? 

Research  studies  of  searcher  behavior  and  performance  have  begun  to  give  us 
an  understanding  of  how  searchers  interact  with  online  systems,  but  there  is  a 
need  for  more  study  of  cognitive  processes  used  in  search  strategy  development 
and  how  these  affect  the  outcome  of  a  search.  Expertise  may  be  tied  to 
particular  databases  and/or  types  of  search  requests.  The  notion  of  a  "script" 
is  one  example  of  an  effort  being  made  to  capture  the  knowledge  of  an  expert 
searcher  in  a  form  which  others  can  use. 

4.  How  do  we  try  to  implement  enhancements  to  existing  retrieval  systems? 

In  this  context  it  is  useful  to  remember  two  methods  identified  in  AI 
research:  simulation/modeling  where  one  models  the  human  searcher's  approach  to 
the  problem  vs.  performance  where  one  uses  machine-based  techniques  which  do  not 
model  human  techniques  but  which  lead  to  improved  performance. 


* 


93 


5*  How  should  we  evaluate  the  performance  of  an  expert  IR  system? 

One  possible  model  is  by  analogy  to  Turing’s  test— if  the  results  of  3 
search  performed  by  the  end  user  with  an  expert  retrieval  system  are  comparable 
to  those  achievable  by  a  human  expert  searcher,  then  it  is  reasonable  to  :se  the 
retrieval  system  in  lieu  of  the  human  intermediary. 

6.  What  are  the  questions? 

In  the  past  we  have  perhaps  focused  our  attention  too  much  on  the  answers 
(as  embodied  in  databases  of  various  kinds)  rather  than  on  the  questions.  What 
are  the  sorts  of  questions  people  ask?  Why  are  they  looking  for  information? 
What  level  of  information  is  required?  Which  questions  are  amenable  to 
processing  by  existing  IR  systems?  Which  could  be  processed  by  an  enhanced  or 
expert  IR  system?  To  match  new  and  more  powerful  tools  to  the  needs  of  users, 
we  need  a  better  understanding  of  what  questions  people  ask.  Reference 
librarians  and  expert  online  searchers  are  a  resource  to  be  tapped  in  helping  us 
better  understand  the  types  of  questions  people  have. 

In  conclusion,  we  did  not  spend  time  talking  about  available  tools  (e.g.,  AI 
machines  as  described  in  the  October  1  issue  of  Business  Week  or  the  expert 
system  software  as  marketed  for  microcomputers,  minicomputers,  and  mainframes). 
The  discussion  groups  concluded  that  use  of  these  tools  must  be  preceded  by 
study  of  human  factors— understanding  information  retrieval  as  a  problem  domain 
and  user  needs— as  suggested  by  the  questions  and  answers  used  to  structure  this 
summary. 


Computer  Interfaces  and  Intermediaries  for 
Information  Retrieval 
FINAL  AGENDA 


Introduction  -  Marjorie  Powell,  Program  Analyst, 

Defense  Technical  Information  Center 

Welcome  -  Richard  D.  Douglas,  Director,  Office  of 
Information  Systems  and  Technology 

Commerce  Energy  NASA  Defense  Information  Progress  Report  - 
Gladys  Cotter,  Technical  Information  Specialist,  Defense 
Technical  Information  Center 

Keynote  -  Martha  Williams,  Professor  of  Information  Science, 
Coordinated  Science  Laboratory  of  the  College  of  Engineering, 
University  of  Illinois 


Session  I 

Automated  Information  Systems: 

The  Human  Element 

Session  Moderator:  William  Bollinger,  Information  Specialist, 

Technology  Information  System,  Lawrence  Livermore  National  Laboratory 

Panel  Presentations  and  Discussion 

Panelists: 

Marcia  Bates,  Associate  Professor,  Graduate  School  of  Library  and 
Information  Science,  University  of  California  at  Los  Angeles 

Christine  Borgman,  Assistant  Professor,  Graduate  School  of  Library  and 
Information  Science,  University  of  California  at  Los  Angeles 

Emily  Fayen,  Director,  Library  Automation,  Baker  Library,  Dartmouth  College 

Carol  Fenichel,  Director  of  Library  Services,  Joseph  W.  England  Library, 
Philadelphia  College  of  Pharmacy  and  Science 

W.  David  Pennioan,  Director,  Libraries  and  Information  Systems,  AT&T  Bell  Labs. 


SESSION  II 

Command  Languages 

Session  Moderator,  William  Leigh,  College  of  Science  and  Technology, 

University  of  Southern  Mississippi 

Panel  Discussion 

Panelists: 

Charles  Hildreth,  Office  of  Research,  OCLC,  Chairperson  of  National  Information 
Standards  Organization  (Z39)  Subcommittee  G,  Common  Command  Language  for  Use 
in  Interactive  Information  Retrieval 


95 


Michael  Monahan,  GEAC  Computers  International,  Markham,  Ontario,  Canada 

Alan  E.  Negus,  Consultant  in  Information  Systems  and  Service,  Biggleswade  Beds., 
England 

Joint  Presentation  on  Integration  of  Command  Languages  with  Intelligent 

Gateways : 

Viktor  E.  Hampel,  Project  Leader,  Technology  Information  System,  Lawrence 
Livermore  Laboratory 

Hilary  Burton,  Project  Leader,  Interagency  Computer  Network,  Technology 
Information  System,  Lawrence  Livermore  National  Laboratory 

Vendor  Presentations 

Dana  Ellingen,  Menlo  Corporation,  In-Search 
Helen  Bell,  SDC,  Searchmaster 
Betty  Davis,  Informatics,  PC/NET 
Richard  Kollin,  Data-Ease,  IT 
David  Toliver,  ISI,  Sci-Mate 


Session  III 
Intermediary  Systems 

Session  Moderator:  Gladys  Cotter,  Technical  Information  Specialist,  Defense 
Technical  Information  Center 

Panel  Presentations 

Panelists: 

Rita  Bergman,  Branch  Manager,  Research  and  Systems  Business  Development, 

Computer  Corporation  of  America 

Tamas  Doszkocs,  Special  Assistant  for  Research  and  Development,  Specialized 
Information  Services,  National  Library  of  Medicine 

Richard  Marcus,  Principal  Research  Scientist,  Laboratory  of  Information  Decision 
Systems,  M.I.T. 

David  Toliver,  Manager,  Software  Development,  ISI 

Session  IV 

Artificial  Intelligence,  Future  Directions 

Session  Moderator:  Hilary  Burton,  Project  Leader,  Interagency  Computer  Network, 
Technology  Information  System,  Lawrence  Livermore  National  Laboratory 

Panel  Presentation 

Panelists: 

Lionel  Bernstein,  President,  Knowledge  Systems,  Inc. 

Gabriel  Jakobson,  Senior  Member,  Technical  Staff,  Computer  Science  Laboratory, 
GTE  Laboratories 


96 


Linda  Smith,  Associate  Professor  of  Library  Science,  University  of  Illinois 

Craig  Thompson,  Member  of  Technical  Staff,  Central  Research  Lab.,  Texas 
Instruments 

Plenary  Workshops 

The  following  topics  will  be  introduced  by  the  respective  chairpersons  to  the 
plenary  session: 

o  Common  Command  Language  -  Charles  Hildreth 

o  Front  Ends  -  Richard  Marcus 

o  Expert  Knowledge  Systems  -  Linda  Smith 

Attendees  will  be  asked  to  participate  in  all  three  workshops  on  a  rotating 
basis.  The  chairpersons  will  summarize  the  contributions  for  presentation  on 
Saturday  morning. 

Discussion  and  Resolutions 

Each  chairperson  will  present  a  summary  and  lead  a  discussion  on  the  respective 


V  H.-X" V.  V'1  -.'  VM  <•  ».^ C 


AUTHOR  INDEX 


Bates,  Marcia.  Some  Design  Ideas  for  Subject  Access 

in  Online  Sy steers . . . 

Borgoan,  Christine.  Human-Computer  Interaction  Research 
and  Information  Retrieval  Systems  (abstract) . 

Burton,  Hilary  D.  Integration  of  Coanon  Command  Languages 
with  Intelligent  Gateways . . . 

Fayen,  Emily.  The  User  Interface:  Some  Preliminary  Results 
from  the  Dartmouth  Online  Catalog . 

Marcus,  Richard.  Intermediary  Systems  for  Information 

Retrieval . . . 

Penniman,  W.  David.  Research  in  Search  Models . 

Thompson,  Craig.  Menu-based  Natural  Language  Interfaces.... 

Smith,  Linda.  Summary  of  Plenary  Workshop  on 

Expert  Knowledge  Systems . 


